# packages ----
library(tidycensus)
library(tigris)
library(tidyverse)
library(sf)
library(areal)
library(janitor)
library(here)
library(units)
library(knitr)
library(kableExtra)
library(tmap)
library(patchwork)
library(scales)
library(digest)
library(mapview)
library(biscale)
library(cowplot)
library(glue)
library(ggtext)
# conflicts ----
library(conflicted)
conflicts_prefer(dplyr::filter)
# options ----
options(scipen = 999) # turn off scientific notation
options(tigris_use_cache = TRUE) # use data caching for tigris
# reference system ----
## set common projected coordinate reference system used throughout this analysis
crs_projected <- 3310 # see: https://epsg.io/3310Estimating Demographics of Custom Spatial Features
Accessing U.S. Census Bureau Data & Calculating Weighted Averages with Areal- and Population-Weighted Interpolation
1 Background
For comments, suggestions, corrections, or questions on anything below, contact david.altare@waterboards.ca.gov, or open an issue on github.
This is a draft / work in progress – some parts are still under development, and existing parts may change.
This document provides an example of how to use tools available from the R programming language (R Core Team 2023) to estimate characteristics of any given target spatial area(s) (e.g., neighborhoods, project boundaries, water supplier service areas, etc.) based on data from a source dataset containing the characteristic data of interest (e.g., census data, CalEnvrioScreen scores, etc.), especially when the boundaries of the source and target areas overlap but don’t necessarily align with each other. It also provides some brief background on the various types of data available from the U.S Census Bureau, and links to a few places to find more in-depth information.
This particular example estimates demographic characteristics of community water systems in the Sacramento County area (the target dataset). It uses the tidycensus R package (Walker and Herman 2023) to access selected demographic data from the U.S. Census Bureau (the source dataset) for census units whose spatial extent covers those water systems’ service areas, then uses the sf package (Pebesma and Bivand 2023) package (for working with spatial data) and the tidyverse collection of packages (Wickham et al. 2019) (for general data cleaning and transformation) to estimate some demographic characteristics of each water system based on that census data. It also uses the areal R package (Prener et al. 2019) to check some of the results, and as general guidance on the principles and techniques for implementing areal interpolation.
This example is just intended to be a simplified demonstration of a possible workflow. For a real analysis, additional steps and considerations – that may not be covered here – may be needed to deal with data inconsistencies (e.g., missing or incomplete data), required level of precision and acceptable assumptions (e.g. more fine-grained datasets or more sophisticated techniques could be used to estimate/model population distributions), or other project-specific issues that might arise.
2 Setup
The code block below loads required packages for this analysis, and sets some user-defined options and defaults. If they aren’t already installed on your computer, you can install them with the R command install.packages('package-name') (and replace package-name with the name of the package you want to install).
3 Census Data Overview
This section provides some brief background on the various types of data available from the U.S. Census Bureau (a later section - Section 5 - demonstrates how to retrieve data from the U.S. Census Bureau using the tidycensus R package). Most of the information covered here comes from the book Analyzing US Census Data: Methods, Maps, and Models in R, which is a great source of information if you’d like more detail about any of the topics below (Walker 2023b).
If you’re already familiar with Census data and want to skip this overview, go directly to the next section: Section 4
Different census products/surveys contain data on different variables, at different geographic scales, over varying periods of time, and with varying levels of certainty. Therefore, there are a number of judgement calls to make when determining which type of census data to use for an analysis – e.g., which data product to use (Decennial Census or American Community Survey), which geographic scale to use (e.g., Block, Block Group, Tract, etc.), what time frame to use, which variables to assess, etc.
More detailed information about U.S. Census Bureau’s data products and other topics mentioned below is available here.
3.1 Census Unit Geography / Hierarchy
Publicly available datasets from the U.S Census Bureau generally consist of individual survey responses aggregated to defined census units (e.g., census tracts) that cover varying geographic scales. Some of these units are nested and can be neatly aggregated (e.g., each census tract is composed of a collection of block groups, and each block group is composed of a collection of blocks), while other census units are outside this hierarchy (e.g., Zip Code Tabulation Areas don’t coincide with any other census unit). Figure 1 shows the relationship of all of the various census units.
Commonly used census statistical units like tracts and block groups have target population size ranges, and can be adjusted every 10 years (with the decennial census) based on population changes. For example, all ACS 5-year datasets prior to 2020 use the 2010 boundaries for tracts, block groups, and blocks, and all ACS 5-year datasets from 2020 onward (presumably through 2029) use the 2020 boundaries for those units. Census tracts are generally around 4,000 people, with a range from about 1,200 to 8,000, and block groups generally contain 600 to 3,000 people. Blocks are the smallest census units, and are “areas bounded by visible features, such as streets, roads, streams, and railroad tracks, and by nonvisible boundaries, such as selected property lines and city, township, school district, and county limits and short line-of-sight extensions of streets and roads”. For example, a census block may be “a city block bounded on all sides by streets”, while “blocks in suburban and rural areas may be larger, more irregular in shape, and bounded by a variety of features, such as roads, streams, and transmission lines”.
Census boundaries can change over time. Commonly used statistical units like tracts, block groups, and blocks tend to be revised every 10 years (with the decennial census), so it’s important to use a census boundary dataset that matches the version of the census demographic data you’re retrieving; otherwise, the demographic data may not match geographic areas in your boundary dataset. In some cases, a census unit that exists in a given year of the census data may not exist at all in a different year’s dataset, because census units can be split or merged when boundaries are revised.
For a list of the different geographic units available for each of the different census products/surveys (see Section 3.2) that can be accessed via the tidycensus package, go here.
3.2 Census Datasets / Surveys
The Decennial Census is conducted every 10 years, and is intended to provide a complete count of the US population and assist with political redistricting. As a result, it collects a relatively limited set of basic demographic data, but (should) provide a high degree of precision (i.e., in general it should provide exact counts). It is available for geographic units down to the census block (the smallest census unit available – see Section 3.1). For information about existing and planned future releases of 2020 census data products, go here.
The American Community Survey (ACS) provides a much larger array of demographic information than the Decennial Census, and is updated more frequently. The ACS is based on a sample of the population (rather than a count of the entire population, as in the Decennial Census), so it represents estimated values rather than precise counts; therefore, each data point is available as an estimate (typically labeled with an “E” in census variable codes, which are discussed in Section 3.3 ) along with an associated margin of error (typically labeled with “M” or “MOE” in census variable codes) around its estimated value.
The ACS is available in two formats. The 5-year ACS is a rolling average of 5 years of data (e.g., the 2021 5-year ACS dataset is an average of the ACS data from 2017 through 2021), and is generally available for geographic units down to the census block group (though some 5-year ACS data may only be available at less granular levels). The 1-year ACS provides data for a single year, and is only available for geographies with population greater than 65,000 (e.g., large cities and counties). Therefore, only the 5-year ACS will be useful for any analysis at a relatively fine scale (e.g., anything that requires data at or more detailed than the census tract level, or any analysis that considers smaller counties/cities – by definition, census tracts always contain significantly fewer than 65,000 people).
In addition to the Decennial Census and ACS data, a number of other census data products/surveys are also available. For example, see the censusapi R package (here or here) for access to over 300 census API endpoints. For historical census data, see the discussion here on using NHGIS, IPUMS, and the ipumsr package.
3.3 Census Variables / Codes
Each census product collects data for many different demographic variables, and each variable is generally associated with an identifier code. In order to access census data programmatically, you often need to know the code associated with each variable of interest. When determining which variables to use, you need to consider what census product contains those variables (see Section 3.2) and how they differ in terms of time frame, precision, spatial granularity (see Section 3.1), etc.
The tidycensus package offers a convenient generic way to search for variables across different census products using the load_variables() function, as described here.
The following websites may also be helpful for exploring the various census data products and finding the variable names and codes they contain:
Census Reporter (for ACS data): https://censusreporter.org/ (especially https://censusreporter.org/topics/table-codes/)
Census Bureau’s list of variable codes, e.g.:
2020 Census codes: https://api.census.gov/data/2020/dec/pl/variables.html
2022 ACS 5 year codes: https://api.census.gov/data/2022/acs/acs5/variables.html
Census Bureau’s data interface (for Decennial Census and ACS, and other census datasets): https://data.census.gov/cedsci/
National Historical Geographic Information System (NHGIS) (for ACS data and historical decennial Census data): https://www.nhgis.org/
4 Target Data Boundaries (Water Systems)
In this section, we’ll get the service area boundaries for Community Water Systems within the Sacramento County area. This will serve as the target dataset – i.e., the set of areas which we’ll be estimating the characteristics of – and will also be used to specify the geographic areas of the census data we want to retrieve. We’ll also get a dataset of county boundaries which overlap the water service areas in this study, which can also help with specifying what census data to access and/or be used to make maps and visualizations.
4.1 Read Water System Data
In this case, we’ll get the water system dataset from a shapefile that’s saved locally, then transform that dataset into a common coordinate reference system for mapping and analysis (which is defined above in the variable crs_projected).
This water system dataset comes from the California Drinking Water System Area Boundaries dataset. For this example, the dataset has been pre-filtered for systems within Sacramento County (by selecting records where the COUNTY field is “SACRAMENTO”) and for Community Water Systems (by selecting records where the STATE_CLAS field is “COMMUNITY”). Some un-needed fields have also been dropped, remaining fields have been re-orderd.
water_systems_sac <- st_read(here('02_data_input',
'water_supplier_boundaries_sac',
'System_Area_Boundary_Layer_Sac.shp')) %>%
st_transform(crs_projected) # transform to common coordinate systemWe can use the glimpse function (below) to take get a sense of what type of information is available in the water system dataset and how it’s structured.
glimpse(water_systems_sac)Rows: 62
Columns: 12
$ WATER_SY_1 <chr> "HOOD WATER MAINTENCE DIST [SWS]", "MC CLELLAN MHP", "MAGNO…
$ WATER_SYST <chr> "CA3400101", "CA3400179", "CA3400130", "CA3400135", "CA3400…
$ GLOBALID <chr> "{36268DB3-9DB2-4305-A85A-2C3A85F20F34}", "{E3BF3C3E-D516-4…
$ BOUNDARY_T <chr> "Water Service Area", "Water Service Area", "Water Service …
$ OWNER_TYPE <chr> "L", "P", "P", "P", "P", "P", "P", "P", "P", "P", "P", "P",…
$ COUNTY <chr> "SACRAMENTO", "SACRAMENTO", "SACRAMENTO", "SACRAMENTO", "SA…
$ REGULATING <chr> "LPA64 - SACRAMENTO COUNTY", "LPA64 - SACRAMENTO COUNTY", "…
$ FEDERAL_CL <chr> "COMMUNITY", "COMMUNITY", "COMMUNITY", "COMMUNITY", "COMMUN…
$ STATE_CLAS <chr> "COMMUNITY", "COMMUNITY", "COMMUNITY", "COMMUNITY", "COMMUN…
$ SERVICE_CO <dbl> 82, 199, 34, 64, 128, 83, 28, 50, 164, 5684, 14798, 115, 33…
$ POPULATION <dbl> 100, 700, 40, 150, 256, 150, 32, 100, 350, 18005, 44928, 20…
$ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((-132703 403..., MULTIPOLYGON (…
Note that this dataset already includes a POPULATION variable that indicates the population served by each water system, which we renamed to water_system_population_reported above (note: I’m not exactly how the data in this variable is derived). However, for this analysis we’ll be making our own estimate of the population within each system’s service area based on U.S. Census Bureau data and the spatial representation of the system boundaries. Given the uncertainty in how the reported population data was derived (including potential temporal differences), the population estimates produced here will likely will not exactly match the reported population data; but, the reported population data may serve as a useful check to make sure our estimates are reasonable.
To make the water system data easier to work with, we can make some more descriptive field names (note that while it’s redundant, we’re using the prefix water_system_ for all field names to distinguish data types when joining this data with other datasets later).
water_systems_sac <- water_systems_sac %>%
rename(water_system_name = WATER_SY_1,
water_system_number = WATER_SYST,
water_system_id = GLOBALID,
water_system_boundary_type = BOUNDARY_T,
water_system_owner_type = OWNER_TYPE,
water_system_county = COUNTY,
water_system_regulating_agency = REGULATING,
water_system_federal_class = FEDERAL_CL,
water_system_state_class = STATE_CLAS,
water_system_service_connections = SERVICE_CO,
water_system_population_reported = POPULATION)Here’s a view of the structure of the revised dataset:
glimpse(water_systems_sac)Rows: 62
Columns: 12
$ water_system_name <chr> "HOOD WATER MAINTENCE DIST [SWS]", "M…
$ water_system_number <chr> "CA3400101", "CA3400179", "CA3400130"…
$ water_system_id <chr> "{36268DB3-9DB2-4305-A85A-2C3A85F20F3…
$ water_system_boundary_type <chr> "Water Service Area", "Water Service …
$ water_system_owner_type <chr> "L", "P", "P", "P", "P", "P", "P", "P…
$ water_system_county <chr> "SACRAMENTO", "SACRAMENTO", "SACRAMEN…
$ water_system_regulating_agency <chr> "LPA64 - SACRAMENTO COUNTY", "LPA64 -…
$ water_system_federal_class <chr> "COMMUNITY", "COMMUNITY", "COMMUNITY"…
$ water_system_state_class <chr> "COMMUNITY", "COMMUNITY", "COMMUNITY"…
$ water_system_service_connections <dbl> 82, 199, 34, 64, 128, 83, 28, 50, 164…
$ water_system_population_reported <dbl> 100, 700, 40, 150, 256, 150, 32, 100,…
$ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((-132703 …
4.1.1 Alternative Data Retrieval Method
Reading in data from a shapefile is shown above because it’s likely one of the more common ways that users will access their target boundary data. However, depending on the dataset, there may be other ways to access the data. For example, the code chunk below demonstrates an alternative – using the arcgislayers package (Parry 2023) – that connects directly to the source dataset (to retrieve the most recent version) and applies the filters needed to reproduce the dataset in the System_Area_Boundary_Layer_Sac.shp file. Also, note that storing data in formats other than the common shapefile format – such as the geopackage format – can have some advantages (for example, see here).
# load arcgislayers package (see: https://r.esri.com/arcgislayers/index.html)
install.packages('pak') # only needed if the pak package is not already installed
pak::pkg_install("R-ArcGIS/arcgislayers", dependencies = TRUE)
library(arcgislayers)
# define link to data source
url_feature <- 'https://gispublic.waterboards.ca.gov/portalserver/rest/services/Drinking_Water/California_Drinking_water_system_numberem_Area_Boundaries/FeatureServer/0'
# connect to data source
water_systems_feature_layer <- arc_open(url_feature)
# download and filter data from source
water_systems_sac <- arc_select(
water_systems_feature_layer,
# apply filters
where = "COUNTY = 'SACRAMENTO' AND STATE_CLASSIFICATION = 'COMMUNITY'",
# select fields
fields = c('WATER_SYSTEM_NAME', 'WATER_SYSTEM_NUMBER', 'GLOBALID',
'BOUNDARY_TYPE', 'OWNER_TYPE_CODE', 'COUNTY',
'REGULATING_AGENCY', 'FEDERAL_CLASSIFICATION',
'STATE_CLASSIFICATION', 'SERVICE_CONNECTIONS', 'POPULATION')) %>%
# transform to common coordinate system
st_transform(crs_projected) %>%
# rename fields
rename(water_system_name = WATER_SYSTEM_NAME,
water_system_number = WATER_SYSTEM_NUMBER,
water_system_id = GLOBALID,
water_system_boundary_type = BOUNDARY_TYPE,
water_system_owner_type = OWNER_TYPE_CODE,
water_system_county = COUNTY,
water_system_regulating_agency = REGULATING_AGENCY,
water_system_federal_class = FEDERAL_CLASSIFICATION,
water_system_state_class = STATE_CLASSIFICATION,
water_system_service_connections = SERVICE_CONNECTIONS,
water_system_population_reported = POPULATION)4.2 Get County Boundaries
When accessing census data using the tidycensus R package as shown below (in Section 5), it’s often useful (though not strictly required) to know which counties overlap the target dataset (note that, even though the dataset is filtered for systems in Sacramento county, there are some systems whose boundaries extend into neighboring counties). County boundaries may also be useful for making maps in later stages of the analysis. We can get a dataset of county boundaries in California from the TIGER dataset, which can be accessed with R using the tigris R package (Walker 2023a).
counties_ca <- counties(state = 'CA',
cb = TRUE) %>% # simplified
st_transform(crs_projected) # transform to common coordinate systemThen, we can get a list of counties that overlap with the boundaries of the Sacramento area community water systems obtained above.
counties_overlap <- counties_ca %>%
st_filter(water_systems_sac,
.predicate = st_overlaps)
counties_list <- counties_overlap %>% pull(NAME)The counties in the counties_list variable are: San Joaquin, Yolo, Placer, Sacramento.
4.3 Plot Target Data
Figure 2 shows the water systems and county boundaries in an interactive map.
mapview(counties_overlap,
alpha.regions = 0,
zcol = 'NAME',
layer.name = 'County',
legend = FALSE) +
mapview(water_systems_sac,
zcol = 'water_system_name',
layer.name = 'Water System',
legend = FALSE)5 Accessing Census Data
The following sections demonstrate how to retrieve census data from the Decennial Census and the ACS using the tidycensus R package.
In order to use the tidycensus R package, you’ll need to obtain a personal API key from the US Census Bureau (which is free and available to anyone) by signing up here: http://api.census.gov/data/key_signup.html. Once you have your API key, you’ll need to register it in R by entering the command census_api_key(key = "YOUR API KEY", install = TRUE) in the console. Note that the install = TRUE argument means that the key is saved for all future R sessions, so you’ll only need to run that command once on your computer (rather than including it in your scripts). Alternatively, you could save your key to an environment variable and retrieve it using Sys.getenv(). Either way will help you avoid the possibility of entering your API key into any scripts that could be shared publicly.
Because the boundaries of census units (e.g., tracts, block groups, blocks, etc) can change over time, it’s important to make sure that the version (year) of the census data you’re retrieving matches the version of the census boundary dataset you’re using. The methods shown below retrieve the census boundary dataset together with the census demographic data, which ensures that this won’t be a potential problem. However, if you use a different workflow that retrieves the geographic boundaries and demographic data via separate processes, you should ensure that the versions are consistent.
Before downloading the census data, we can create an object that we can use to filter our requests to the census API so that they will only return census units that overlap with our target areas (the object will be passed to the filter_by argument of the get_decennial function below). Note that this isn’t strictly necessary (you could also apply the filter after making the API request), but may helpful to speed the query and reduce memory usage, especially in the case of large queries.
At the time of this writing, the filter_by argument of the tidycensus get_decennial and get_acs functions is fairly new, and not yet included in the official documentation.
Also, the filter_by argument is optional, and only appears to accept a simple features (sf) object with a single row / feature (e.g., a single water system), and will not accept an sf object with multiple rows / features. The process below attempts to work around this constraint by joining all of the selected water systems into a single multi-part polygon (i.e., an sf object with a single row). However, if you only want to retrieve data for census units that overlap a single target area (e.g., a single water system), you can skip this step.
water_systems_filter <- water_systems_sac %>%
st_union() %>%
st_as_sf()5.1 Decennial Census
This section retrieves census data from the Decennial Census, using the get_decennial function from the tidycensus package. As of this writing, the most recent version of the decennial census data available is from 2020, and we can set that as a variable below.
# set year
decennial_year <- 2020Next, we can define the list of demographic variables we’d like to retrieve tabular data for, by saving the census variables we want in the census_vars_decennial object (see Section 3.3 for more information about how to discover variables of interest and find their associated codes). Note that here we’re providing descriptive names associated with each variable code, which makes the data easier to work with later, but isn’t strictly necessary (i.e., you could just supply the variable codes alone).
# define variables to pull from the decennial census
census_vars_decennial <- c(
'population_hispanic_or_latino_count' = 'P2_002N', # Total Hispanic or Latino
'population_white_count' = 'P2_005N', # White (Not Hispanic or Latino)
'population_black_or_african_american_count' = 'P2_006N', # Black or African American (Not Hispanic or Latino)
'population_native_american_or_alaska_native_count' = 'P2_007N', # American Indian and Alaska Native (Not Hispanic or Latino)
'population_asian_count' = 'P2_008N', # Asian (Not Hispanic or Latino)
'population_pacific_islander_count' = 'P2_009N', # Native Hawaiian and Other Pacific Islander (Not Hispanic or Latino)
'population_other_count' = 'P2_010N', # Some other race (Not Hispanic or Latino)
'population_multiple_count' = 'P2_011N', # Two or more races (Not Hispanic or Latino)
'population_total_count' = 'P2_001N'
)Now, we can make the data request, using the get_decennial function, which accepts several arguments that specify exactly what data to return.
For this example we’re getting data at the ‘Block’ level (with the geography = 'block' argument) for the demographic variables defined above in the census_vars_decennial object (which is passed to the variables argument). As noted above, block-level data is the most granular level of spatial data available, and should provide the best results when estimating demographics for areas whose boundaries don’t align with census unit boundaries. However, depending on the use case, it may require too much time and computational resources to use the most granular spatial data, and may not be necessary to obtain a reasonable estimate. Also, keep in mind that block-level data may not be available for all variables, and some variables may only be available at less granular spatial scales (like block groups or tracts).
In addition to the tabular data associated with the demographic variables in our list, we’ll also get the spatial data – i.e., the boundaries of the census blocks – by setting the geometry = TRUE argument. When we do this, the tabular demographic data is pre-joined to the spatial data, so the API request returns a single dataset with both the spatial and attribute (demographic) data combined.
The tidycensus package generally returns the Census Bureau’s cartographic boundary shapefiles by default (as opposed to the core TIGER/Line shapefiles, which is the default format returned by the tigris R package). The default cartographic boundary shapefiles are pre-clipped to the US coastline, and are smaller/faster to process (alternatively you can use cb = FALSE to get the core TIGER/Line data) (see here). So the default spatial data returned by tidycensus may be somewhat different than the default spatial data returned by the tigris package, but in general I find it’s best to use the default tidycensus spatial data.
At the block level, it appears that tidycensus only returns the more detailed core TIGER/Line shapefiles (i.e., they are identical to the default block-level geographic data returned by tigris). In some cases, that can create minor inconsistencies when working with both blocks and block groups and using the default geographies.
We also narrow down the search parameters geographically by specifying the state (with state = 'CA') and counties (county = counties_list) we’re seeking data for, and provide an object to the filter_by argument which filters the data returned so that it only includes census units that overlap with our target areas (see Note 1 above for more information).
Supplying a list of counties may not be strictly necessary, especially in cases where you supply the optional filter_by argument. However, especially when working with granular data like blocks, supplying the county argument seems to greatly speed the API request.
Also, while by default the tidycensus package returns data in long/tidy format, we’re getting the data in wide format for this example (by specifying output = 'wide') because it’ll be easier to work with for the interpolation method described below to estimate demographics for non-census geographies.
# get census data
census_data_decennial <- get_decennial(geography = 'block', # can be 'block', 'block group', 'tract', 'county', etc.
state = 'CA',
county = counties_list,
filter_by = water_systems_filter,
year = decennial_year,
variables = census_vars_decennial,
output = 'wide', # can be 'wide' or 'tidy'
geometry = TRUE,
cache_table = TRUE) %>%
st_transform(crs_projected) # convert to common coordinate systemThe output is an sf object (i.e., a dataframe-like object that also includes spatial data), in wide format, where each row represents a census unit, and the population of each racial/ethnic group is reported in a separate column. Here’s a view of the contents and structure of the Decennial Census data that’s returned:
glimpse(census_data_decennial)Rows: 17,745
Columns: 12
$ GEOID <chr> "060670019003011", "…
$ NAME <chr> "Block 3011, Block G…
$ population_hispanic_or_latino_count <dbl> 4, 6, 8, 11, 1, 14, …
$ population_white_count <dbl> 20, 4, 167, 70, 86, …
$ population_black_or_african_american_count <dbl> 2, 2, 0, 8, 9, 18, 0…
$ population_native_american_or_alaska_native_count <dbl> 0, 0, 0, 0, 0, 0, 0,…
$ population_asian_count <dbl> 19, 5, 2, 1, 23, 8, …
$ population_pacific_islander_count <dbl> 0, 0, 0, 0, 0, 0, 0,…
$ population_other_count <dbl> 0, 0, 0, 0, 0, 0, 0,…
$ population_multiple_count <dbl> 8, 3, 4, 10, 5, 10, …
$ population_total_count <dbl> 53, 20, 181, 100, 12…
$ geometry <POLYGON [m]> POLYGON ((-1…
5.2 American Community Survey (ACS)
To get data from the ACS, you can use the get_acs() function, which is very similar to the get_decennial() function used above. As of this writing, the most recent version of the 5-year ACS data available is the 2018-2022 ACS, and we can set that as a variable below (which makes it easier to update this document in future years).
# set year
acs_year <- 2022However, since the ACS data contains data on a much broader set of socio-economic metrics, the requested data includes a greatly expanded list of variables, defined in the census_vars_acs object (see Section 3.3 for more information about how to discover variables of interest and find their associated codes). As above, we can provide descriptive names associated with each variable code, which makes the data easier to work with later, but isn’t strictly necessary (i.e., you could just supply the variable codes alone). Note that the use of prefixes (like population_ or households_) and suffixes (like _count) is intentional – those will be used later as part of the calculation process.
# define variables to pull from the ACS
census_vars_acs <- c(
# --- population variables ---
'population_total_count' = 'B01003_001',
'population_hispanic_or_latino_count' = 'B03002_012', # Total Hispanic or Latino
'population_white_count' = 'B03002_003', # White (Not Hispanic or Latino)
'population_black_or_african_american_count' = 'B03002_004', # Black or African American (Not Hispanic or Latino)
'population_native_american_or_alaska_native_count' = 'B03002_005', # American Indian and Alaska Native (Not Hispanic or Latino)
'population_asian_count' = 'B03002_006', # Asian (Not Hispanic or Latino)
'population_pacific_islander_count' = 'B03002_007', # Native Hawaiian and Other Pacific Islander (Not Hispanic or Latino)
'population_other_count' = 'B03002_008', # Some other race (Not Hispanic or Latino)
'population_multiple_count' = 'B03002_009', # Two or more races (Not Hispanic or Latino)
# --- poverty variables ---
'poverty_total_assessed_count' = 'B17021_001', # also available from 'B17020_001' (at the tract level only). Total population for whom poverty status is determined. Poverty status was determined for all people except institutionalized people, people in military group quarters, people in college dormitories, and unrelated individuals under 15 years old. These groups were excluded from the numerator and denominator when calculating poverty rates.
'poverty_below_level_count' = 'B17021_002', # also available from 'B17020_002' (at the tract level only). Population whose income in the past 12 months is below federal poverty level. A family and every individual in it are considered to be in poverty if the family's total income is less than the dollar value of a threshold that varies depending upon size of family, number of children, & age of householder (for 1- & 2- person households). Income is the sum of wage/salary income; net self-employment income; interest/dividends/net rental/royalty income/income from estates & trusts; Social Security/Railroad Retirement income; Supplemental Security Income (SSI); public assistance/welfare payments; retirement/survivor/disability pensions; & all other income.
'poverty_above_level_count' = 'B17021_019', # also available from 'B17020_010' (at the tract level only). Population whose income in the past 12 months is at or above federal poverty level. A family and every individual in it are considered to be in poverty if the family's total income is less than the dollar value of a threshold that varies depending upon size of family, number of children, & age of householder (for 1- & 2- person households). Income is the sum of wage/salary income; net self-employment income; interest/dividends/net rental/royalty income/income from estates & trusts; Social Security/Railroad Retirement income; Supplemental Security Income (SSI); public assistance/welfare payments; retirement/survivor/disability pensions; & all other income.
# --- household variables ---
'households_count' = 'B19001_001', # also available from variable 'B19053_001'. A household includes all the people who occupy a housing unit - a house, an apartment, a mobile home, a group of rooms, or a single room that is occupied. People not living in households are classified as living in group quarters.
'average_household_size' = 'B25010_001', # A measure obtained by dividing the number of people living in occupied housing units by the total number of occupied housing units. This measure is rounded to the nearest hundredth.
# --- household income variables ---
'median_household_income' = 'B19013_001', # also available from 'B19019_001' (at the tract level only). Income in the past 12 months is the sum of wage or salary income; net self-employment income; interest, dividends, or net rental or royalty income or income from estates and trusts; Social Security or Railroad Retirement income; Supplemental Security Income (SSI); public assistance or welfare payments; retirement, survivor, or disability pensions; and all other income.
'households_income_below_10k_count' = 'B19001_002', # count of households with income below $10,000
'households_income_10k_15k_count' = 'B19001_003', # count of households with income $10,000 to $15,000
'households_income_15k_20k_count' = 'B19001_004',
'households_income_20k_25k_count' = 'B19001_005',
'households_income_25k_30k_count' = 'B19001_006',
'households_income_30k_35k_count' = 'B19001_007',
'households_income_35k_40k_count' = 'B19001_008',
'households_income_40k_45k_count' = 'B19001_009',
'households_income_45k_50k_count' = 'B19001_010',
'households_income_50k_60k_count' = 'B19001_011',
'households_income_60k_75k_count' = 'B19001_012',
'households_income_75k_100k_count' = 'B19001_013',
'households_income_100k_125k_count' = 'B19001_014',
'households_income_125k_150k_count' = 'B19001_015',
'households_income_150k_200k_count' = 'B19001_016',
'households_income_above_200k_count' = 'B19001_017', # count of households with income above $200,000
# --- housing costs variables (% of household income) ---
# Housing Costs as a Percentage of Household Income in the past 12 months - NOTE: THIS TABLE IS NEW FOR THE 2022 ACS, AND WON'T BE AVAILABLE FOR PREVIOUS YEARS - Table B25140 shows the count of households paying more than 30% of their income towards housing costs broken out by three tenure categories (owned with a mortgage, owned without a mortgage, and rented). The table also shows the number of households paying more than 50% of their income toward housing costs.
# 'households_count' = 'B25140_001',
'households_mortgage_total_count' = 'B25140_002',
'households_mortgage_housing_costs_over30pct_count' = 'B25140_003',
'households_mortgage_housing_costs_over50pct_count' = 'B25140_004',
'households_no_mortgage_total_count' = 'B25140_006',
'households_no_mortgage_housing_costs_over30pct_count' = 'B25140_007',
'households_no_mortgage_housing_costs_over50pct_count' = 'B25140_008',
'households_rent_total_count' = 'B25140_010',
'households_rent_housing_costs_over30pct_count' = 'B25140_011',
'households_rent_housing_costs_over50pct_count' = 'B25140_012',
# --- other income / economic variables ---
'per_capita_income' = 'B19301_001' # note: per capita income by race (at block group level) available in table B19301I
)Finally, we can make the data request, using the get_acs function, which is very similar to the get_decennial function described above ( Section 5.1). However, for this example we’re getting data at the ‘Block Group’ level (with the geography = 'block group' argument), which is the most granular level of spatial data available for ACS data. But, keep in mind that block group-level data may not be available for all variables, and some variables may only be available at less granular spatial scales (like tracts). Note that the water_systems_filter object supplied to the filter_by argument was created above in Listing 1 (and see Note 1 above for more information about this argument).
# get census data
census_data_acs <- get_acs(geography = 'block group',
state = 'CA',
county = counties_list,
filter_by = water_systems_filter,
year = acs_year,
survey = 'acs5',
variables = census_vars_acs,
output = 'wide', # can be 'wide' or 'tidy'
geometry = TRUE,
cache_table = TRUE) %>%
st_transform(crs_projected) # convert to common coordinate systemAs above, the output is an sf object (i.e., a dataframe-like object that also includes spatial data), in wide format, where each row represents a census unit, and the each demographic variable is reported in a separate column. Here’s a view of the contents and structure of the 2022 5-year ACS data that’s returned (only the first few fields are shown):
glimpse(census_data_acs[,1:20])Rows: 1,054
Columns: 21
$ GEOID <chr> "060670081451", "06…
$ NAME <chr> "Block Group 1; Cen…
$ population_total_countE <dbl> 1768, 1881, 1098, 2…
$ population_total_countM <dbl> 520, 585, 395, 583,…
$ population_hispanic_or_latino_countE <dbl> 38, 327, 376, 782, …
$ population_hispanic_or_latino_countM <dbl> 59, 298, 280, 315, …
$ population_white_countE <dbl> 1627, 1337, 293, 18…
$ population_white_countM <dbl> 521, 475, 191, 460,…
$ population_black_or_african_american_countE <dbl> 0, 1, 272, 26, 351,…
$ population_black_or_african_american_countM <dbl> 13, 3, 251, 38, 334…
$ population_native_american_or_alaska_native_countE <dbl> 41, 0, 0, 26, 0, 0,…
$ population_native_american_or_alaska_native_countM <dbl> 58, 13, 13, 42, 13,…
$ population_asian_countE <dbl> 45, 0, 105, 58, 144…
$ population_asian_countM <dbl> 71, 13, 116, 66, 18…
$ population_pacific_islander_countE <dbl> 0, 98, 0, 0, 27, 13…
$ population_pacific_islander_countM <dbl> 13, 98, 13, 13, 50,…
$ population_other_countE <dbl> 0, 0, 39, 0, 0, 0, …
$ population_other_countM <dbl> 13, 13, 63, 13, 13,…
$ population_multiple_countE <dbl> 17, 118, 13, 39, 15…
$ population_multiple_countM <dbl> 27, 125, 20, 57, 25…
$ geometry <POLYGON [m]> POLYGON ((-…
Note that the dataset that’s returned includes fields corresponding to Margin of Error (MOE) for each variable we’ve requested (these are the fields that end an M – e.g., “population_total_countM”), since, as noted above in Section 3.2 , the ACS is based on a sample of the population and reports estimated values.
For further analysis, we may want to get the statewide data as a baseline for comparison (this could also be done for other scales, like the county level). We can use a similar process to get that data and clean/format it to match the more detailed data obtained above. Note that in this case we’re also using the 5-year ACS (even though the 1-year ACS is also available at the statewide level, and would provide more up-to-date data) so that the statewide data will be directly comparable to the block group level data obtained above.
census_data_acs_state <- get_acs(geography = 'state',
state = 'CA',
year = acs_year,
survey = 'acs5',
variables = census_vars_acs,
output = 'wide', # can be 'wide' or 'tidy'
geometry = TRUE,
cache_table = TRUE) %>%
st_transform(crs_projected) %>% # convert to common coordinate system
select(-matches('M$')) %>% # the $ specifies "ends with"
# clean names (note this is a little different than the way we renamed fields above, either works)
rename_with(.fn = ~ str_remove(., # remove 'E' (estimate) from field names
pattern = 'E$')) %>%
rename_with(.fn = ~ str_replace(., # add 'E' back to NAME field
pattern = 'NAM',
replacement = 'NAME'))5.3 Plot Census & Supplier Data
system_plot <- 'SACRAMENTO SUBURBAN WATER DISTRICT'Figure 3 shows the 2022 5-year ACS census units that overlap with one of the water systems (Sacramento Suburban Water District) that we’ll compute demographics for below (plotting the census units that overlap all systems tends to be slow in this format).
mapview(water_systems_sac %>%
filter(water_system_name == system_plot),
zcol = 'water_system_name',
layer.name = 'Water System',
legend = FALSE) +
mapview(census_data_acs %>%
st_filter(water_systems_sac %>%
filter(water_system_name == system_plot)),
alpha.regions = 0,
color = 'cyan',
lwd = 1.3, label = 'NAME',
layer.name = 'ACS Data',
legend = FALSE) # zcol = 'NAME'6 Compute Water System Demographics
Now we can perform the calculations to estimate demographic characteristics for our target areas (water system service boundaries in the Sacramento County area) from our source demographic dataset (the census data we obtained above). For this example, we’ll use the 2022 5-year ACS data that we retrieved above (which is saved in the census_data_acs variable) as our source of demographic data, and we’ll estimate the following for each water system’s service area:
- Population of each racial/ethnic group (using the racial/ethnic categories defined in the census dataset), and each racial/ethnic group’s portion of the total service area population
- Socio-economic variables like poverty rate, median household income, income distributions, and per capita income
There are multiple ways this estimation can be done. For this example, we’ll employ a three step strategy:
Estimate values for count-based variables (typically referred to as ‘extensive’ data types) – e.g., total population, popultion by race/ethnicity, population above / below poverty rate, households by income bracket – for overlapping census unit, using areal interpolation. This is essentially an area weighted average, which estimates how much of each source unit’s (census unit) count applies to the target area (a given water service area), based on the portion of its area that overlaps that target area – for more information about the process, see this documentation from the
arealR package. For example, for a census unit that partially overlaps a service area, only a fraction of its count for a given variable will be applied to that service area; for a census unit that completely overlaps a service area, the full count for that variable will be applied to the service area.The major simplifying assumption of this approach is that the population or count-based variable of interest are evenly distributed within each unit in the source data. For example, in this case we’re assuming that population (including the total population and the population of each racial/ethic group), households of each income bracket, populations above / below the poverty rate, etc. are evenly distributed within each census block group.
While this section uses the block group-level count data from the 5-year ACS, there may be cases where it could be useful or necessary to use more granular block-level population data from the decennial census to estimate population densities and distributions within larger census units, like block groups and tracts. This could especially be the case when estimating characteristics for small areas in rural environments. See Section 9 and/or Section 10 for more information.
- Using the estimated count data (populations, households, etc), compute weighted values for variables that describe those populations, using the associated count data as a weighting factor (e.g., population-weighted values for population based data, or household-weighted values for household-based data) – these variables are typically referred to as ‘intensive’ data types.
Although it’s possible to use areal interpolation to aggregate these variables as well, the multi-step approach described here can be useful because we know (from the population / household count data) that population densities differ between census units. Since we have a reasonable estimate of the count data (population, households, etc) within each census unit, using a population or household weighted average likely will yield more accurate results than a simple area-weighted average for these variables. For example, for per capita income, we can use the estimated population counts to produce a population weighted average per capita income (rather than an area weighted average per capita income, which is likely less meaningful as it over-weights large census areas with lower population densities). Areal interpolation may be more useful for cases where we generally have no other information about how density varies between the source polygons (unless significantly more effort is invested, such as looking at aerial imagery data)
- Aggregate interpolated values at the water system level.
6.1 Prepare Census Data
Note that we already transformed the 2022 5-year ACS dataset into the common projected coordinate reference system used for this example immediately after we downloaded the data using the get_acs() function (see Listing 3). This allows us to work with the water system data and the census data together in a common coordinate system.
Before calculating demographics for the target areas, we can do a bit of additional transformation to prepare the census data. First, because we won’t be incorporating the margin of error (MOE) into the analysis below, we can drop them for this example, then clean up the field names.
It is possible to calculate MOEs for derived estimates – e.g., when aggregating groups of census units – and in many cases it may be worthwhile to do that to provide extra context to the data. However, it may not be possible (or may be very difficult) to calculate MOEs for data estimated using more complex aggregations, such as the areal interpolation shown below – more research on that may be needed.
For guidance on how calculate MOEs for some types of derived estimates, see this document.
For an alternative, simplified approach to estimating census demographics for target areas which includes MOEs for the derived estimates, see Section 11.1.
# drop MOE fields
census_data_acs <- census_data_acs %>%
select(-matches('M$')) # the $ specifies "ends with"
# clean names
names(census_data_acs) <- names(census_data_acs) %>%
str_remove('E$') %>% # remove 'E' (estimate) from field names
str_replace('NAM', 'NAME') # add 'E' back to NAME fieldHere’s a view of the contents and structure of the revised 2022 5-year ACS dataset (only the first few fields are shown):
glimpse(census_data_acs[,1:20])Rows: 1,054
Columns: 21
$ GEOID <chr> "060670081451", "060…
$ NAME <chr> "Block Group 1; Cens…
$ population_total_count <dbl> 1768, 1881, 1098, 27…
$ population_hispanic_or_latino_count <dbl> 38, 327, 376, 782, 3…
$ population_white_count <dbl> 1627, 1337, 293, 181…
$ population_black_or_african_american_count <dbl> 0, 1, 272, 26, 351, …
$ population_native_american_or_alaska_native_count <dbl> 41, 0, 0, 26, 0, 0, …
$ population_asian_count <dbl> 45, 0, 105, 58, 144,…
$ population_pacific_islander_count <dbl> 0, 98, 0, 0, 27, 13,…
$ population_other_count <dbl> 0, 0, 39, 0, 0, 0, 0…
$ population_multiple_count <dbl> 17, 118, 13, 39, 15,…
$ poverty_total_assessed_count <dbl> 1768, 1847, 1098, 27…
$ poverty_below_level_count <dbl> 101, 328, 272, 116, …
$ poverty_above_level_count <dbl> 1667, 1519, 826, 263…
$ households_count <dbl> 680, 718, 405, 905, …
$ average_household_size <dbl> 2.59, 2.62, 2.71, 2.…
$ median_household_income <dbl> 123500, 66768, 56216…
$ households_income_below_10k_count <dbl> 18, 47, 10, 22, 6, 1…
$ households_income_10k_15k_count <dbl> 0, 0, 24, 0, 15, 231…
$ households_income_15k_20k_count <dbl> 0, 13, 18, 0, 51, 12…
$ geometry <POLYGON [m]> POLYGON ((-1…
We can also do some other transformations to simplify the data. For example, we can combine the ‘other’ and ‘multiple’ racial/ethnic groupings into one ‘other or multiple’ racial/ethnic group.
## combine other and multiple
census_data_acs <- census_data_acs %>%
mutate('population_other_or_multiple_count' =
population_other_count + population_multiple_count,
.after = population_pacific_islander_count) %>%
select(-c(population_other_count, population_multiple_count))And we can calculate the poverty rate for each census unit (which may be useful for presenting results later).
census_data_acs <- census_data_acs %>%
mutate(poverty_rate_pct_calc_census_unit = case_when(
poverty_total_assessed_count == 0 ~ 0,
.default = 100 * poverty_below_level_count / poverty_total_assessed_count
),
.after = poverty_above_level_count)6.2 Interpolation Step 1: Areal Interpolation (for Count Variables)
There are a couple of ways to implement the areal interpolation method. The example below ‘manually’ implements the process using functions from the sf package, for reasons described below. However, note that there are R packages which make it possible to perform areal interpolation with a single function - for example, the sf package’s st_interpolate_aw function and the areal package’s aw_interpolate function. This example uses a more ‘manual’ approach because this makes it possible to use the multi-step process described above, and also produces useful intermediate calculated data for mapping and visualization. However, we can use the single-function approach to double check our implementation of the areal interpolation approach for the count data (see Section 6.4.2).
First, we clip the census data to the water system boundaries:
census_data_clip <- census_data_acs %>%
mutate(census_unit_area = st_area(.)) %>%
st_intersection(water_systems_sac) %>%
mutate(clipped_area = st_area(.)) %>%
mutate(areal_weight_factor = drop_units(clipped_area / census_unit_area))Figure 4 shows a plot of the census units clipped to the Sacramento Suburban Water District water system, along with the original/complete census units. Note that you can toggle layers on and off (and change their order of appearance) using the layers button in the upper left part of the map (below the zoom buttons).
mapview(water_systems_sac %>%
filter(water_system_name == system_plot),
zcol = 'water_system_name',
layer.name = 'Water System',
legend = FALSE) +
mapview(census_data_acs %>%
st_filter(water_systems_sac %>%
filter(water_system_name == system_plot)),
alpha.regions = 0.15,
col.regions = 'grey',
color = 'black',
lwd = 1,
label = 'NAME',
layer.name = 'ACS Data Full',
legend = FALSE) +
mapview(census_data_clip %>%
filter(water_system_name == system_plot),
alpha.regions = 0,
color = 'cyan',
lwd = 1.3,
label = 'NAME',
layer.name = 'ACS Data Clipped',
legend = FALSE)Next, we can compute the area-weighted counts for the portions of census units that overlap each water system boundary:
census_data_interpolate <- census_data_clip %>%
mutate(
across(
.cols = ends_with('_count'),
.fns = ~ .x * areal_weight_factor
)) As noted above, it’s also possible to use pre-built functions from several R packages to perform areal interpolation in a single step. Since we’re using a three-step process, which also implements population weighted averaging for some variables, we’re not using those functions directly in this example. However, they can be a useful check to validate our computed count data, but only after we aggregate our data at the system level – see Section 6.4.2 for more details.
6.3 Interpolation Step 2: Compute Population Weighted Values (Intensive Variables)
Compute population weighted values
census_data_interpolate <- census_data_interpolate %>%
mutate(average_household_size_weighted = average_household_size * households_count,
median_household_income_weighted = median_household_income * households_count,
per_capita_income_weighted = per_capita_income * population_total_count)To calculate an aggregated value for a variable like median household income, which depends on the distribution of the underling data, it may be worth considering whether a weighed average value is an appropriate measure. In some cases, it may be more appropriate to use the counts in each income bracket to estimate a median income, and/or present the income distribution rather than a single value.
For a discussion of the problem and a proposed solution, see this document.
6.4 Interpolation Step 3: Aggregate by Water System
Next, we need to combine the weighted values calculated above to produce the estimates for each water system, and can also use those combined values to compute some additional metrics for each system (like rates, income distributions, etc.).
6.4.1 Combine Results by Water System
First, combine the results by summing all of the count-based variables (derived from areal interpolation), and calculating weighted averages for all variables computed in step 2 above. Note that we have to first calculate the denominator for each variable calculated with population weighted interpolation, because some of those variables contain missing values for records where the denominator is present (and if we don’t remove the missing values, we get an NA for any water system that contains a block group with a missing value for that variable). For example, there are block groups where the median household income is missing, but the total household count is available for that block group – in that case, the weighted average should not include the households in that block group in the denominator; otherwise, the true value will be underestimated.
water_system_demographics <- census_data_interpolate %>%
group_by(water_system_name) %>%
mutate(
average_household_size_denominator = if_else(
is.na(average_household_size),
0,
households_count),
median_household_income_denominator = if_else(
is.na(median_household_income),
0,
households_count),
per_capita_income_denominator = if_else(
is.na(per_capita_income),
0,
population_total_count)
) %>%
summarize(
across(
.cols = ends_with('_count'),
.fns = ~ sum(.x)
),
average_household_size_hh_weighted =
sum(average_household_size_weighted, na.rm = TRUE) /
sum(average_household_size_denominator),
median_household_income_hh_weighted =
sum(median_household_income_weighted, na.rm = TRUE) /
sum(median_household_income_denominator),
per_capita_income_pop_weighted =
sum(per_capita_income_weighted, na.rm = TRUE) /
sum(per_capita_income_denominator)
) %>%
ungroup()6.4.2 Check - Count Variables Estimated with Areal Interpolation
As noted above, it’s also possible to use pre-built functions for areal interpolation. This section demonstrates those functions and uses them as a check of our computed count data.
From the sf package, we can use the st_interpolate_aw function:
# NOTE: it's only necessary to check the estimated values for one variable -
# this just checks the total estimated population
# sf package
check_sf <- st_interpolate_aw(x = census_data_acs %>%
select(population_total_count),
to = water_systems_sac,
extensive = TRUE) %>%
bind_cols(water_systems_sac %>% st_drop_geometry)
# check - should be TRUE if results are equivalent
all(check_sf %>%
arrange(water_system_name) %>%
pull(population_total_count) %>%
round(5) ==
water_system_demographics %>%
arrange(water_system_name) %>%
pull(population_total_count) %>%
round(5)
)[1] TRUE
From the areal package, we can use the aw_interpolate function. Note that there are some settings that you may need to modify in the aw_interpolate function depending on the type of analysis you’re doing. In particular, for more information about the weight argument – which can be either sum or total – see this section of the documentation. For more information about extensive versus intensive interpolations, see this section of the documenation (as noted above, the method applied here avoids using areal interpolation to calculate intensive variables, because area may not be a good metric for determining how to weight those variables, considering that we can estimate associated counts for populations/households/etc.).
# NOTE: it's only necessary to check the estimated values for one variable -
# this just checks the total estimated population
# areal package
check_areal <- aw_interpolate(water_systems_sac,
tid = water_system_name,
source = census_data_acs,
sid = GEOID,
weight = 'total',
extensive = c('population_total_count'))
# check - should be TRUE if results are equivalent
all(check_areal %>%
arrange(water_system_name) %>%
pull(population_total_count) %>%
round(5) ==
water_system_demographics %>%
arrange(water_system_name) %>%
pull(population_total_count) %>%
round(5)
)[1] TRUE
6.4.3 Clean & Format Summarized Water System Demographic Data
We could stop here, and save the dataset containing the results to an output file (done below - see Section 6.5). But, it may be useful to do some additional computations and re-formatting before saving the dataset. For example, in this case it may be useful to calculate the racial/ethnic breakdown of each system’s population as percentages of the total population (in addition to the total counts computed above), and calculate other rates / distributions.
First we can add fields with each racial/ethnic group’s estimated percent of the total population within each water system’s service area:
water_system_demographics <- water_system_demographics %>%
mutate(
across(
.cols = starts_with('population_'),
.fns = ~ round(.x / population_total_count * 100, 2),
.names = "{str_replace(.col, '_count', '_percent')}"
),
.after = population_other_or_multiple_count) %>%
select(-population_total_percent) # this always equals 1, not neededWe can also calculate the estimated poverty rate for each water system’s service area.
water_system_demographics <- water_system_demographics %>%
mutate(poverty_rate_percent = case_when(
poverty_total_assessed_count == 0 ~ 0,
.default = 100 * poverty_below_level_count / poverty_total_assessed_count
),
.after = poverty_above_level_count)And compute income brackets in 25k increments:
water_system_demographics <- water_system_demographics %>%
mutate(households_income_0_25k_count =
households_income_below_10k_count +
households_income_10k_15k_count +
households_income_15k_20k_count +
households_income_20k_25k_count,
households_income_25k_50k_count =
households_income_25k_30k_count +
households_income_30k_35k_count +
households_income_35k_40k_count +
households_income_40k_45k_count +
households_income_45k_50k_count,
households_income_50k_75k_count =
households_income_50k_60k_count +
households_income_60k_75k_count,
.after = households_income_above_200k_count
) # note - above 75k is already in 25k incrementsAnd compute income brackets in 50k increments:
water_system_demographics <- water_system_demographics %>%
mutate(households_income_0_50k_count =
households_income_0_25k_count +
households_income_25k_50k_count,
households_income_50k_100k_count =
households_income_50k_75k_count +
households_income_75k_100k_count,
households_income_100k_150k_count =
households_income_100k_125k_count +
households_income_125k_150k_count,
.after = households_income_50k_75k_count
) # above 150k is already in 50k incrementsAnd compute grouped median household income:
And compute # and % of households below income thresholds:
And, compute the portion of households paying more than 30% / 50% of their income toward housing costs:
# portion of households paying more than 30% / 50% of income on housing
water_system_demographics <- water_system_demographics %>%
mutate(households_all_housing_costs_over30pct_percent =
100 * (households_mortgage_housing_costs_over30pct_count +
households_no_mortgage_housing_costs_over30pct_count +
households_rent_housing_costs_over30pct_count) /
households_count,
.after = households_rent_housing_costs_over50pct_count) %>%
mutate(households_all_housing_costs_over50pct_percent =
100 * (households_mortgage_housing_costs_over50pct_count +
households_no_mortgage_housing_costs_over50pct_count +
households_rent_housing_costs_over50pct_count) /
households_count,
.after = households_all_housing_costs_over30pct_percent)Finally, we can round the estimated values to appropriate levels of precision:
water_system_demographics <- water_system_demographics %>%
mutate(
across(
.cols = ends_with('_count'),
.fns = ~ round(.x, 0)
)) %>%
mutate(
across(
.cols = ends_with('_percent'),
.fns = ~ round(.x, 2)
))We now have a dataset with the selected metrics from the census data (source data) estimated for each of the water system service areas (target geographic features). Here’s a view of the contents and structure of the re-formatted dataset (only the first few fields are shown):
glimpse(water_system_demographics[,1:20])Rows: 62
Columns: 21
$ water_system_name <chr> "B & W RESORT MARI…
$ population_total_count <dbl> 0, 22603, 33120, 1…
$ population_hispanic_or_latino_count <dbl> 0, 10939, 5245, 34…
$ population_white_count <dbl> 0, 3504, 19456, 23…
$ population_black_or_african_american_count <dbl> 0, 2663, 3199, 197…
$ population_native_american_or_alaska_native_count <dbl> 0, 121, 113, 70, 0…
$ population_asian_count <dbl> 0, 4075, 2947, 108…
$ population_pacific_islander_count <dbl> 0, 240, 77, 59, 0,…
$ population_other_or_multiple_count <dbl> 0, 1060, 2082, 110…
$ population_hispanic_or_latino_percent <dbl> 41.43, 48.40, 15.8…
$ population_white_percent <dbl> 52.47, 15.50, 58.7…
$ population_black_or_african_american_percent <dbl> 0.00, 11.78, 9.66,…
$ population_native_american_or_alaska_native_percent <dbl> 0.00, 0.54, 0.34, …
$ population_asian_percent <dbl> 4.55, 18.03, 8.90,…
$ population_pacific_islander_percent <dbl> 0.00, 1.06, 0.23, …
$ population_other_or_multiple_percent <dbl> 1.56, 4.69, 6.29, …
$ poverty_total_assessed_count <dbl> 0, 22556, 33034, 1…
$ poverty_below_level_count <dbl> 0, 6010, 3389, 313…
$ poverty_above_level_count <dbl> 0, 16546, 29645, 6…
$ poverty_rate_percent <dbl> 22.60, 26.64, 10.2…
$ geometry <POLYGON [m]> POLYGON ((…
Table 1 provides a complete view of the cleaned and re-formatted dataset. These results are saved locally in tabular and spatial format in Section 6.5.
water_system_demographics %>%
kable(caption = 'A Caption') %>%
scroll_box(height = "400px")| water_system_name | population_total_count | population_hispanic_or_latino_count | population_white_count | population_black_or_african_american_count | population_native_american_or_alaska_native_count | population_asian_count | population_pacific_islander_count | population_other_or_multiple_count | population_hispanic_or_latino_percent | population_white_percent | population_black_or_african_american_percent | population_native_american_or_alaska_native_percent | population_asian_percent | population_pacific_islander_percent | population_other_or_multiple_percent | poverty_total_assessed_count | poverty_below_level_count | poverty_above_level_count | poverty_rate_percent | households_count | households_income_below_10k_count | households_income_10k_15k_count | households_income_15k_20k_count | households_income_20k_25k_count | households_income_25k_30k_count | households_income_30k_35k_count | households_income_35k_40k_count | households_income_40k_45k_count | households_income_45k_50k_count | households_income_50k_60k_count | households_income_60k_75k_count | households_income_75k_100k_count | households_income_100k_125k_count | households_income_125k_150k_count | households_income_150k_200k_count | households_income_above_200k_count | households_income_0_25k_count | households_income_25k_50k_count | households_income_50k_75k_count | households_income_0_50k_count | households_income_50k_100k_count | households_income_100k_150k_count | households_mortgage_total_count | households_mortgage_housing_costs_over30pct_count | households_mortgage_housing_costs_over50pct_count | households_no_mortgage_total_count | households_no_mortgage_housing_costs_over30pct_count | households_no_mortgage_housing_costs_over50pct_count | households_rent_total_count | households_rent_housing_costs_over30pct_count | households_rent_housing_costs_over50pct_count | households_all_housing_costs_over30pct_percent | households_all_housing_costs_over50pct_percent | average_household_size_hh_weighted | median_household_income_hh_weighted | per_capita_income_pop_weighted | geometry |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| B & W RESORT MARINA | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 41.43 | 52.47 | 0.00 | 0.00 | 4.55 | 0.00 | 1.56 | 0 | 0 | 0 | 22.60 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 40.53 | 21.84 | 2.030000 | 51977.00 | 40522.00 | POLYGON ((-138282.2 13643.2... |
| CAL AM FRUITRIDGE VISTA | 22603 | 10939 | 3504 | 2663 | 121 | 4075 | 240 | 1060 | 48.40 | 15.50 | 11.78 | 0.54 | 18.03 | 1.06 | 4.69 | 22556 | 6010 | 16546 | 26.64 | 6900 | 354 | 339 | 521 | 263 | 367 | 302 | 359 | 355 | 565 | 692 | 876 | 784 | 459 | 235 | 287 | 141 | 1477 | 1948 | 1569 | 3425 | 2352 | 694 | 1620 | 745 | 345 | 1236 | 95 | 58 | 4044 | 2131 | 1059 | 43.06 | 21.18 | 3.257806 | 53040.44 | 20519.57 | POLYGON ((-127001.4 54266.8... |
| CALAM - ANTELOPE | 33120 | 5245 | 19456 | 3199 | 113 | 2947 | 77 | 2082 | 15.84 | 58.74 | 9.66 | 0.34 | 8.90 | 0.23 | 6.29 | 33034 | 3389 | 29645 | 10.26 | 10529 | 315 | 184 | 101 | 122 | 116 | 469 | 248 | 368 | 449 | 737 | 1077 | 1669 | 1501 | 1077 | 1158 | 937 | 723 | 1650 | 1814 | 2373 | 3483 | 2578 | 5544 | 1861 | 621 | 1747 | 184 | 106 | 3238 | 1678 | 649 | 35.36 | 13.07 | 3.134530 | 93741.55 | 34660.44 | POLYGON ((-120906.3 77326.5... |
| CALAM - ARDEN | 10112 | 3433 | 2392 | 1977 | 70 | 1082 | 59 | 1100 | 33.95 | 23.65 | 19.55 | 0.69 | 10.70 | 0.58 | 10.87 | 10034 | 3130 | 6904 | 31.19 | 3823 | 201 | 259 | 239 | 167 | 319 | 190 | 142 | 236 | 207 | 440 | 394 | 535 | 228 | 148 | 62 | 58 | 866 | 1093 | 834 | 1959 | 1368 | 376 | 265 | 84 | 46 | 133 | 8 | 3 | 3426 | 2124 | 1170 | 57.97 | 31.87 | 2.623643 | 49624.62 | 22770.82 | POLYGON ((-123052 64046.06,... |
| CALAM - ISLETON | 34 | 14 | 17 | 0 | 0 | 2 | 0 | 1 | 42.06 | 51.14 | 0.00 | 0.00 | 4.55 | 0.00 | 2.25 | 34 | 7 | 27 | 20.89 | 16 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 2 | 1 | 1 | 3 | 1 | 0 | 1 | 4 | 3 | 3 | 6 | 4 | 4 | 6 | 4 | 1 | 7 | 2 | 2 | 4 | 1 | 1 | 39.45 | 20.68 | 2.078994 | 57361.76 | 40672.21 | POLYGON ((-138730.9 17272.8... |
| CALAM - LINCOLN OAKS | 42916 | 9056 | 26529 | 1486 | 143 | 2706 | 288 | 2708 | 21.10 | 61.82 | 3.46 | 0.33 | 6.31 | 0.67 | 6.31 | 42823 | 4074 | 38749 | 9.51 | 15621 | 740 | 375 | 308 | 622 | 488 | 616 | 585 | 629 | 645 | 1035 | 1641 | 2442 | 1889 | 1272 | 1555 | 778 | 2046 | 2964 | 2675 | 5010 | 5118 | 3161 | 7390 | 2671 | 919 | 3332 | 503 | 298 | 4900 | 2523 | 1302 | 36.46 | 16.13 | 2.730281 | 82035.52 | 33728.94 | POLYGON ((-117495.2 73240.4... |
| CALAM - PARKWAY | 58635 | 18665 | 8921 | 6965 | 21 | 19228 | 1386 | 3449 | 31.83 | 15.21 | 11.88 | 0.04 | 32.79 | 2.36 | 5.88 | 58434 | 9804 | 48630 | 16.78 | 17667 | 1081 | 753 | 514 | 713 | 694 | 640 | 713 | 700 | 727 | 1145 | 1918 | 2490 | 1634 | 1532 | 1546 | 865 | 3061 | 3475 | 3064 | 6536 | 5554 | 3166 | 7163 | 2719 | 1049 | 3418 | 647 | 383 | 7086 | 3517 | 1917 | 38.96 | 18.96 | 3.284608 | 72938.51 | 26938.14 | POLYGON ((-124522.5 52428.5... |
| CALAM - SUBURBAN ROSEMONT | 57897 | 13791 | 25062 | 7725 | 91 | 6905 | 380 | 3942 | 23.82 | 43.29 | 13.34 | 0.16 | 11.93 | 0.66 | 6.81 | 57661 | 8374 | 49287 | 14.52 | 21045 | 1156 | 612 | 472 | 744 | 653 | 568 | 582 | 874 | 628 | 1289 | 2508 | 3438 | 2595 | 1594 | 1671 | 1661 | 2985 | 3305 | 3797 | 6290 | 7235 | 4189 | 8262 | 2262 | 730 | 3425 | 439 | 271 | 9358 | 4521 | 2320 | 34.31 | 15.78 | 2.726937 | 81229.87 | 34497.37 | POLYGON ((-119360.4 58937.6... |
| CALAM - WALNUT GROVE | 12 | 5 | 5 | 0 | 0 | 1 | 0 | 0 | 44.60 | 45.84 | 0.00 | 0.00 | 5.93 | 0.00 | 3.63 | 12 | 2 | 10 | 15.75 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 2 | 2 | 2 | 0 | 2 | 0 | 0 | 1 | 0 | 0 | 2 | 1 | 0 | 24.49 | 14.65 | 2.490000 | 68248.00 | 38950.00 | POLYGON ((-131705.3 26403.6... |
| CALIFORNIA STATE FAIR | 532 | 78 | 262 | 91 | 0 | 48 | 0 | 52 | 14.68 | 49.25 | 17.13 | 0.00 | 9.10 | 0.00 | 9.85 | 526 | 152 | 374 | 28.89 | 285 | 65 | 13 | 8 | 5 | 9 | 14 | 2 | 0 | 23 | 29 | 30 | 35 | 21 | 11 | 17 | 3 | 91 | 48 | 59 | 140 | 93 | 32 | 0 | 0 | 0 | 0 | 0 | 0 | 285 | 177 | 95 | 62.11 | 33.45 | 1.820000 | 52886.00 | 33141.00 | POLYGON ((-125611.2 65287.3... |
| CARMICHAEL WATER DISTRICT | 39253 | 6192 | 25026 | 2230 | 68 | 3326 | 295 | 2116 | 15.78 | 63.76 | 5.68 | 0.17 | 8.47 | 0.75 | 5.39 | 38700 | 5000 | 33700 | 12.92 | 15937 | 570 | 534 | 513 | 472 | 398 | 607 | 522 | 684 | 541 | 996 | 1595 | 1782 | 1724 | 1200 | 1678 | 2122 | 2088 | 2751 | 2591 | 4839 | 4373 | 2924 | 5256 | 1399 | 669 | 3147 | 358 | 177 | 7534 | 4056 | 2068 | 36.48 | 18.29 | 2.405914 | 96967.64 | 46901.80 | POLYGON ((-117711 65208.06,... |
| CITRUS HEIGHTS WATER DISTRICT | 68912 | 12380 | 48148 | 2092 | 162 | 2875 | 71 | 3186 | 17.96 | 69.87 | 3.04 | 0.23 | 4.17 | 0.10 | 4.62 | 68581 | 6961 | 61620 | 10.15 | 25633 | 1012 | 569 | 446 | 769 | 665 | 867 | 841 | 723 | 1165 | 1875 | 3057 | 3954 | 2744 | 2332 | 2533 | 2080 | 2796 | 4261 | 4932 | 7057 | 8886 | 5075 | 10344 | 3553 | 1380 | 4293 | 554 | 286 | 10996 | 5759 | 2620 | 38.49 | 16.72 | 2.653808 | 82960.78 | 37323.17 | POLYGON ((-114405.5 72735.6... |
| CITY OF SACRAMENTO MAIN | 516189 | 151211 | 159508 | 62060 | 1249 | 98585 | 9242 | 34334 | 29.29 | 30.90 | 12.02 | 0.24 | 19.10 | 1.79 | 6.65 | 508800 | 77003 | 431797 | 15.13 | 194000 | 9540 | 9401 | 6217 | 6407 | 5804 | 6255 | 6278 | 6139 | 6729 | 13349 | 17396 | 26982 | 20453 | 15080 | 17439 | 20531 | 31564 | 31205 | 30745 | 62769 | 57728 | 35533 | 67435 | 21769 | 8217 | 29857 | 3476 | 1805 | 96708 | 47510 | 24524 | 37.50 | 17.81 | 2.609594 | 84694.02 | 39105.61 | POLYGON ((-133314 51929.51,... |
| DEL PASO MANOR COUNTY WATER DI | 5592 | 687 | 3967 | 390 | 15 | 119 | 31 | 382 | 12.28 | 70.95 | 6.97 | 0.26 | 2.13 | 0.56 | 6.84 | 5592 | 621 | 4971 | 11.10 | 2222 | 170 | 45 | 54 | 66 | 21 | 51 | 66 | 237 | 40 | 158 | 278 | 166 | 171 | 120 | 347 | 231 | 336 | 416 | 436 | 752 | 601 | 291 | 922 | 326 | 189 | 572 | 112 | 68 | 729 | 509 | 114 | 42.59 | 16.67 | 2.516895 | 90374.38 | 40254.83 | POLYGON ((-120068.3 65980.9... |
| DELTA CROSSING MHP | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 69.19 | 28.71 | 0.00 | 0.00 | 0.00 | 0.00 | 2.10 | 0 | 0 | 0 | 17.42 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 45.66 | 25.57 | 2.550000 | 56250.00 | 23510.00 | POLYGON ((-132498.4 40410.2... |
| EAST WALNUT GROVE [SWS] | 3 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 44.60 | 45.84 | 0.00 | 0.00 | 5.93 | 0.00 | 3.63 | 3 | 1 | 3 | 15.75 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 24.49 | 14.65 | 2.490000 | 68248.00 | 38950.00 | POLYGON ((-132506.3 25966.4... |
| EDGEWATER MOBILE HOME PARK | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3.90 | 89.23 | 3.23 | 0.00 | 0.00 | 0.00 | 3.63 | 0 | 0 | 0 | 35.94 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 28.02 | 23.19 | 1.790000 | 38125.00 | 33103.00 | POLYGON ((-153562.3 7972.28... |
| EL DORADO MOBILE HOME PARK | 139 | 84 | 11 | 15 | 0 | 19 | 0 | 11 | 60.26 | 7.80 | 10.48 | 0.00 | 13.27 | 0.00 | 8.19 | 139 | 60 | 79 | 43.12 | 48 | 6 | 10 | 0 | 4 | 6 | 1 | 0 | 8 | 1 | 7 | 0 | 1 | 0 | 4 | 0 | 1 | 19 | 15 | 8 | 34 | 9 | 4 | 3 | 0 | 0 | 10 | 5 | 5 | 35 | 17 | 10 | 46.70 | 31.09 | 2.710000 | 29468.00 | 17394.00 | POLYGON ((-124341.2 53660.1... |
| EL DORADO WEST MHP | 148 | 89 | 12 | 16 | 0 | 20 | 0 | 12 | 60.26 | 7.80 | 10.48 | 0.00 | 13.27 | 0.00 | 8.19 | 147 | 63 | 84 | 43.12 | 51 | 6 | 10 | 0 | 4 | 6 | 1 | 0 | 8 | 2 | 8 | 0 | 1 | 0 | 5 | 0 | 1 | 20 | 16 | 8 | 37 | 9 | 5 | 3 | 0 | 0 | 10 | 6 | 6 | 38 | 18 | 10 | 46.70 | 31.09 | 2.710000 | 29468.00 | 17394.00 | POLYGON ((-124532.3 53662.9... |
| ELEVEN OAKS MOBILE HOME COMMUNITY | 233 | 45 | 94 | 56 | 0 | 37 | 0 | 1 | 19.27 | 40.19 | 24.01 | 0.00 | 15.91 | 0.00 | 0.62 | 233 | 87 | 146 | 37.48 | 71 | 7 | 2 | 3 | 6 | 10 | 2 | 1 | 1 | 3 | 1 | 13 | 17 | 3 | 0 | 3 | 0 | 17 | 17 | 15 | 34 | 32 | 3 | 8 | 3 | 1 | 21 | 1 | 1 | 42 | 29 | 23 | 46.85 | 35.36 | 3.280000 | 60521.00 | 18213.00 | POLYGON ((-119819.8 71950.9... |
| ELK GROVE WATER SERVICE | 42647 | 7656 | 19550 | 3209 | 70 | 8939 | 388 | 2835 | 17.95 | 45.84 | 7.53 | 0.16 | 20.96 | 0.91 | 6.65 | 42258 | 3264 | 38994 | 7.72 | 13239 | 430 | 202 | 253 | 224 | 328 | 102 | 345 | 292 | 245 | 667 | 1117 | 1441 | 1470 | 1386 | 1907 | 2832 | 1108 | 1311 | 1784 | 2420 | 3225 | 2856 | 7552 | 1903 | 628 | 2861 | 283 | 113 | 2826 | 1595 | 864 | 28.55 | 12.12 | 3.179068 | 122771.00 | 43429.03 | POLYGON ((-118730.1 42496.7... |
| FAIR OAKS WATER DISTRICT | 36003 | 4655 | 27050 | 708 | 94 | 1372 | 12 | 2113 | 12.93 | 75.13 | 1.97 | 0.26 | 3.81 | 0.03 | 5.87 | 35775 | 2852 | 32923 | 7.97 | 14233 | 546 | 332 | 113 | 229 | 208 | 391 | 206 | 469 | 293 | 804 | 1064 | 2214 | 1447 | 1568 | 1875 | 2474 | 1220 | 1568 | 1868 | 2788 | 4082 | 3016 | 7090 | 1872 | 845 | 3092 | 261 | 108 | 4051 | 1844 | 768 | 27.94 | 12.09 | 2.480217 | 107985.74 | 54435.01 | POLYGON ((-112317.5 69577.6... |
| FLORIN COUNTY WATER DISTRICT | 9951 | 2963 | 1548 | 1394 | 7 | 2743 | 866 | 430 | 29.78 | 15.56 | 14.01 | 0.07 | 27.56 | 8.70 | 4.32 | 9835 | 1285 | 8550 | 13.06 | 2755 | 84 | 125 | 53 | 154 | 103 | 46 | 86 | 176 | 224 | 258 | 223 | 432 | 297 | 215 | 143 | 137 | 417 | 635 | 481 | 1051 | 913 | 512 | 981 | 426 | 90 | 675 | 49 | 28 | 1100 | 476 | 260 | 34.48 | 13.70 | 3.573005 | 67048.12 | 24517.64 | POLYGON ((-122791.9 52602.2... |
| FOLSOM STATE PRISON | 3536 | 1257 | 652 | 1390 | 57 | 70 | 34 | 77 | 35.55 | 18.43 | 39.31 | 1.60 | 1.97 | 0.96 | 2.17 | 29 | 1 | 28 | 2.20 | 23 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 4 | 12 | 1 | 0 | 0 | 0 | 0 | 1 | 8 | 3 | 1 | 0 | 0 | 0 | 0 | 19 | 0 | 0 | 4.67 | 0.53 | 2.726311 | 161047.22 | 2271.22 | POLYGON ((-99838.11 75350.0... |
| FOLSOM, CITY OF - ASHLAND | 3845 | 318 | 2934 | 43 | 1 | 125 | 1 | 423 | 8.26 | 76.32 | 1.12 | 0.03 | 3.26 | 0.02 | 10.99 | 3780 | 143 | 3637 | 3.79 | 1800 | 44 | 17 | 104 | 43 | 34 | 209 | 103 | 74 | 43 | 43 | 158 | 248 | 132 | 80 | 123 | 345 | 208 | 463 | 201 | 670 | 449 | 212 | 594 | 164 | 90 | 847 | 368 | 82 | 358 | 196 | 74 | 40.42 | 13.70 | 2.087286 | 76810.17 | 56773.97 | POLYGON ((-102605.9 74922.1... |
| FOLSOM, CITY OF - MAIN | 62462 | 8433 | 35222 | 1693 | 105 | 12934 | 177 | 3897 | 13.50 | 56.39 | 2.71 | 0.17 | 20.71 | 0.28 | 6.24 | 62115 | 3405 | 58710 | 5.48 | 22409 | 807 | 218 | 390 | 477 | 418 | 283 | 329 | 373 | 451 | 670 | 1181 | 2255 | 2382 | 1747 | 4083 | 6344 | 1892 | 1855 | 1851 | 3747 | 4106 | 4129 | 11491 | 2728 | 1179 | 3590 | 237 | 146 | 7328 | 3010 | 1321 | 26.66 | 11.81 | 2.769356 | 141856.37 | 58469.35 | POLYGON ((-101870.6 66094.5... |
| FREEPORT MARINA | 3 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 69.19 | 28.71 | 0.00 | 0.00 | 0.00 | 0.00 | 2.10 | 3 | 1 | 3 | 17.42 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 45.66 | 25.57 | 2.550000 | 56250.00 | 23510.00 | POLYGON ((-130970.7 50553.3... |
| GALT, CITY OF | 21490 | 9314 | 9952 | 520 | 22 | 872 | 20 | 789 | 43.34 | 46.31 | 2.42 | 0.10 | 4.06 | 0.09 | 3.67 | 21341 | 1404 | 19937 | 6.58 | 6988 | 139 | 168 | 243 | 210 | 141 | 342 | 161 | 347 | 152 | 550 | 687 | 807 | 1096 | 504 | 789 | 650 | 761 | 1143 | 1237 | 1904 | 2044 | 1601 | 3724 | 907 | 523 | 1454 | 109 | 44 | 1809 | 906 | 414 | 27.52 | 14.05 | 3.048249 | 90632.93 | 33685.54 | MULTIPOLYGON (((-113921.6 2... |
| GOLDEN STATE WATER CO - ARDEN WATER SERV | 6556 | 1706 | 2887 | 322 | 0 | 888 | 11 | 742 | 26.02 | 44.04 | 4.91 | 0.00 | 13.54 | 0.16 | 11.32 | 6453 | 1626 | 4828 | 25.19 | 2173 | 19 | 82 | 19 | 141 | 53 | 173 | 34 | 179 | 37 | 139 | 351 | 319 | 132 | 172 | 141 | 183 | 262 | 476 | 490 | 738 | 809 | 303 | 728 | 239 | 123 | 131 | 0 | 0 | 1315 | 599 | 335 | 38.56 | 21.09 | 2.897716 | 66579.36 | 30417.36 | POLYGON ((-121143.9 63698.4... |
| GOLDEN STATE WATER CO. - CORDOVA | 48115 | 9009 | 26042 | 3982 | 229 | 6050 | 188 | 2615 | 18.72 | 54.13 | 8.28 | 0.48 | 12.57 | 0.39 | 5.43 | 47835 | 4408 | 43427 | 9.21 | 18022 | 509 | 482 | 310 | 496 | 480 | 437 | 389 | 469 | 598 | 1276 | 1692 | 2653 | 2565 | 1671 | 1948 | 2047 | 1796 | 2374 | 2968 | 4170 | 5621 | 4236 | 7380 | 2174 | 836 | 3506 | 364 | 201 | 7137 | 2744 | 1410 | 29.31 | 13.58 | 2.650717 | 96697.06 | 42695.41 | POLYGON ((-112985.4 62375.3... |
| HAPPY HARBOR (SWS) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3.90 | 89.23 | 3.23 | 0.00 | 0.00 | 0.00 | 3.63 | 0 | 0 | 0 | 35.94 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 28.02 | 23.19 | 1.790000 | 38125.00 | 33103.00 | POLYGON ((-139842 11256.85,... |
| HOLIDAY MOBILE VILLAGE | 46 | 18 | 7 | 3 | 0 | 15 | 0 | 3 | 38.66 | 15.12 | 7.10 | 0.00 | 32.49 | 0.00 | 6.64 | 46 | 10 | 36 | 22.33 | 16 | 2 | 1 | 0 | 1 | 0 | 1 | 5 | 1 | 0 | 0 | 2 | 2 | 1 | 0 | 0 | 0 | 4 | 7 | 2 | 11 | 4 | 1 | 2 | 0 | 0 | 2 | 1 | 1 | 12 | 6 | 4 | 44.88 | 28.55 | 2.860000 | 38491.00 | 16707.00 | POLYGON ((-123874.7 52485.3... |
| HOOD WATER MAINTENCE DIST [SWS] | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 69.19 | 28.71 | 0.00 | 0.00 | 0.00 | 0.00 | 2.10 | 1 | 0 | 1 | 17.42 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 45.66 | 25.57 | 2.550000 | 56250.00 | 23510.00 | MULTIPOLYGON (((-132506 403... |
| IMPERIAL MANOR MOBILEHOME COMMUNITY | 209 | 52 | 129 | 1 | 0 | 6 | 0 | 21 | 24.93 | 61.63 | 0.45 | 0.00 | 2.93 | 0.00 | 10.05 | 209 | 45 | 164 | 21.48 | 124 | 4 | 26 | 18 | 3 | 0 | 16 | 7 | 5 | 6 | 1 | 4 | 29 | 0 | 0 | 0 | 6 | 51 | 34 | 5 | 84 | 34 | 0 | 9 | 0 | 0 | 89 | 37 | 34 | 27 | 27 | 22 | 50.97 | 45.07 | 1.680363 | 31831.84 | 32878.17 | POLYGON ((-115390.2 74250.3... |
| KORTHS PIRATES LAIR | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3.90 | 89.23 | 3.23 | 0.00 | 0.00 | 0.00 | 3.63 | 0 | 0 | 0 | 35.94 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 28.02 | 23.19 | 1.790000 | 38125.00 | 33103.00 | POLYGON ((-137314.9 10213.1... |
| LAGUNA DEL SOL INC | 24 | 5 | 18 | 0 | 0 | 0 | 0 | 0 | 21.55 | 75.20 | 0.00 | 0.67 | 1.46 | 0.00 | 1.12 | 24 | 2 | 22 | 6.40 | 9 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 1 | 2 | 2 | 1 | 0 | 3 | 2 | 1 | 5 | 2 | 2 | 3 | 0 | 0 | 2 | 0 | 0 | 23.37 | 23.37 | 2.640000 | 95227.00 | 50793.00 | POLYGON ((-104662.2 49197.3... |
| LAGUNA VILLAGE RV PARK | 20 | 3 | 2 | 1 | 0 | 11 | 2 | 2 | 12.79 | 8.48 | 7.28 | 0.00 | 52.62 | 8.38 | 10.45 | 20 | 2 | 18 | 11.79 | 7 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 3 | 1 | 0 | 1 | 0 | 0 | 3 | 1 | 0 | 32.52 | 12.26 | 3.030000 | 84332.00 | 32668.00 | POLYGON ((-122461.8 48066.6... |
| LINCOLN CHAN-HOME RANCH | 4 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 44.60 | 45.84 | 0.00 | 0.00 | 5.93 | 0.00 | 3.63 | 4 | 1 | 3 | 15.75 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 24.49 | 14.65 | 2.490000 | 68248.00 | 38950.00 | POLYGON ((-136788.6 36526.1... |
| LOCKE WATER WORKS CO [SWS] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 44.60 | 45.84 | 0.00 | 0.00 | 5.93 | 0.00 | 3.63 | 1 | 0 | 1 | 15.75 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 24.49 | 14.65 | 2.490000 | 68248.00 | 38950.00 | POLYGON ((-131952.8 27176.6... |
| MAGNOLIA MUTUAL WATER | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 44.60 | 45.84 | 0.00 | 0.00 | 5.93 | 0.00 | 3.63 | 1 | 0 | 1 | 15.75 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 24.49 | 14.65 | 2.490000 | 68248.00 | 38950.00 | POLYGON ((-137022.9 36118.9... |
| MC CLELLAN MHP | 269 | 52 | 108 | 65 | 0 | 43 | 0 | 2 | 19.27 | 40.19 | 24.01 | 0.00 | 15.91 | 0.00 | 0.62 | 269 | 101 | 168 | 37.48 | 82 | 8 | 2 | 3 | 7 | 11 | 2 | 2 | 1 | 3 | 1 | 15 | 20 | 3 | 0 | 3 | 0 | 20 | 19 | 17 | 39 | 36 | 3 | 9 | 4 | 2 | 25 | 1 | 1 | 48 | 34 | 27 | 46.85 | 35.36 | 3.280000 | 60521.00 | 18213.00 | POLYGON ((-119814.9 72169.0... |
| OLYMPIA MOBILODGE | 290 | 70 | 81 | 18 | 0 | 101 | 16 | 3 | 24.12 | 28.03 | 6.30 | 0.00 | 34.95 | 5.53 | 1.08 | 290 | 68 | 222 | 23.43 | 114 | 11 | 0 | 6 | 10 | 9 | 3 | 13 | 0 | 0 | 10 | 19 | 8 | 3 | 12 | 5 | 5 | 28 | 25 | 29 | 53 | 36 | 14 | 31 | 22 | 10 | 51 | 12 | 10 | 33 | 9 | 7 | 37.35 | 23.74 | 2.510000 | 53786.00 | 29451.00 | POLYGON ((-123342.4 53061.6... |
| ORANGE VALE WATER COMPANY | 17387 | 2658 | 12308 | 241 | 181 | 633 | 86 | 1281 | 15.28 | 70.79 | 1.39 | 1.04 | 3.64 | 0.49 | 7.37 | 17288 | 1904 | 15384 | 11.01 | 6595 | 389 | 111 | 61 | 94 | 226 | 58 | 274 | 120 | 181 | 372 | 752 | 990 | 901 | 626 | 678 | 766 | 655 | 858 | 1123 | 1512 | 2113 | 1526 | 3246 | 1021 | 453 | 1686 | 315 | 185 | 1663 | 693 | 305 | 30.77 | 14.29 | 2.608348 | 92693.71 | 42509.89 | POLYGON ((-108131.3 74330.4... |
| PLANTATION MOBILE HOME PARK | 10 | 4 | 1 | 1 | 0 | 3 | 0 | 1 | 38.66 | 15.12 | 7.10 | 0.00 | 32.49 | 0.00 | 6.64 | 10 | 2 | 7 | 22.33 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 2 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 1 | 44.88 | 28.55 | 2.860000 | 38491.00 | 16707.00 | POLYGON ((-124180.4 53321.5... |
| RANCHO MARINA | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3.90 | 89.23 | 3.23 | 0.00 | 0.00 | 0.00 | 3.63 | 0 | 0 | 0 | 35.94 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 28.02 | 23.19 | 1.790000 | 38125.00 | 33103.00 | POLYGON ((-138041.4 11320.9... |
| RANCHO MURIETA COMMUNITY SERVI | 3239 | 661 | 2157 | 120 | 7 | 188 | 0 | 106 | 20.42 | 66.59 | 3.71 | 0.21 | 5.80 | 0.00 | 3.26 | 3239 | 199 | 3040 | 6.13 | 1402 | 59 | 42 | 0 | 6 | 5 | 18 | 74 | 27 | 75 | 44 | 81 | 88 | 118 | 204 | 241 | 319 | 108 | 199 | 125 | 307 | 213 | 323 | 1029 | 205 | 103 | 270 | 63 | 57 | 103 | 41 | 40 | 22.02 | 14.30 | 2.307704 | 144993.81 | 66451.34 | POLYGON ((-92457.85 52674.7... |
| RIO COSUMNES CORRECTIONAL CENTER [SWS] | 22 | 6 | 8 | 4 | 1 | 1 | 0 | 2 | 25.74 | 37.49 | 16.82 | 2.97 | 4.50 | 1.81 | 10.66 | 4 | 0 | 4 | 0.00 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 23.75 | 0.00 | 3.450000 | 115897.00 | 11095.00 | POLYGON ((-124032.5 32206.2... |
| RIO LINDA/ELVERTA COMMUNITY WATER DIST | 11831 | 2585 | 7595 | 337 | 17 | 765 | 21 | 512 | 21.85 | 64.19 | 2.85 | 0.14 | 6.46 | 0.18 | 4.33 | 11829 | 1619 | 10210 | 13.69 | 3762 | 177 | 156 | 67 | 169 | 56 | 113 | 116 | 114 | 118 | 173 | 297 | 607 | 492 | 431 | 416 | 259 | 569 | 518 | 470 | 1087 | 1077 | 922 | 1918 | 573 | 157 | 773 | 114 | 47 | 1070 | 519 | 340 | 32.07 | 14.49 | 3.123012 | 83603.04 | 33734.49 | POLYGON ((-126609.8 73568.2... |
| RIVER'S EDGE MARINA & RESORT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3.90 | 89.23 | 3.23 | 0.00 | 0.00 | 0.00 | 3.63 | 0 | 0 | 0 | 35.94 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 28.02 | 23.19 | 1.790000 | 38125.00 | 33103.00 | POLYGON ((-141102.2 11867.3... |
| SAC CITY MOBILE HOME COMMUNITY LP | 229 | 82 | 17 | 7 | 0 | 123 | 0 | 0 | 35.66 | 7.50 | 3.27 | 0.00 | 53.57 | 0.00 | 0.00 | 229 | 110 | 119 | 48.14 | 89 | 11 | 16 | 9 | 10 | 8 | 0 | 0 | 4 | 2 | 7 | 1 | 13 | 4 | 4 | 0 | 0 | 46 | 14 | 8 | 60 | 21 | 8 | 4 | 2 | 2 | 15 | 2 | 0 | 71 | 41 | 30 | 48.95 | 35.43 | 2.530000 | 22380.00 | 16689.00 | POLYGON ((-124544.3 56147.0... |
| SACRAMENTO SUBURBAN WATER DISTRICT | 193126 | 43047 | 97872 | 17684 | 834 | 20602 | 624 | 12464 | 22.29 | 50.68 | 9.16 | 0.43 | 10.67 | 0.32 | 6.45 | 190984 | 33399 | 157585 | 17.49 | 72505 | 3817 | 3001 | 3069 | 2884 | 3205 | 3100 | 3337 | 2893 | 2342 | 5541 | 6792 | 10037 | 6480 | 4342 | 5488 | 6177 | 12771 | 14878 | 12333 | 27649 | 22370 | 10822 | 23467 | 7204 | 2837 | 12037 | 2087 | 1160 | 37001 | 21072 | 10274 | 41.88 | 19.68 | 2.635471 | 73746.51 | 35321.18 | MULTIPOLYGON (((-122206.9 6... |
| SAN JUAN WATER DISTRICT | 30122 | 3409 | 21349 | 831 | 287 | 2762 | 17 | 1467 | 11.32 | 70.87 | 2.76 | 0.95 | 9.17 | 0.06 | 4.87 | 30014 | 1718 | 28297 | 5.72 | 10750 | 389 | 168 | 100 | 275 | 128 | 160 | 111 | 133 | 127 | 472 | 684 | 984 | 854 | 876 | 1032 | 4256 | 932 | 658 | 1156 | 1591 | 2141 | 1730 | 6210 | 1754 | 724 | 2883 | 528 | 357 | 1658 | 726 | 339 | 27.98 | 13.21 | 2.783858 | 160696.10 | 72978.42 | POLYGON ((-104526.8 73044.7... |
| SCWA - ARDEN PARK VISTA | 8086 | 990 | 6016 | 270 | 12 | 396 | 8 | 395 | 12.24 | 74.40 | 3.33 | 0.15 | 4.90 | 0.10 | 4.88 | 8038 | 523 | 7515 | 6.51 | 3303 | 79 | 36 | 48 | 77 | 65 | 38 | 18 | 49 | 162 | 139 | 187 | 253 | 465 | 208 | 416 | 1065 | 241 | 330 | 326 | 571 | 579 | 673 | 1823 | 520 | 112 | 673 | 76 | 23 | 807 | 384 | 225 | 29.69 | 10.90 | 2.424845 | 139081.65 | 84548.46 | POLYGON ((-120985.4 62883.8... |
| SCWA - LAGUNA/VINEYARD | 145495 | 27502 | 38496 | 16568 | 246 | 50411 | 2220 | 10052 | 18.90 | 26.46 | 11.39 | 0.17 | 34.65 | 1.53 | 6.91 | 145198 | 14710 | 130489 | 10.13 | 45137 | 1692 | 666 | 742 | 878 | 839 | 1336 | 850 | 788 | 752 | 2363 | 3198 | 6037 | 5323 | 5057 | 6578 | 8038 | 3978 | 4565 | 5561 | 8543 | 11598 | 10380 | 24581 | 7232 | 2916 | 7878 | 861 | 471 | 12677 | 6368 | 3337 | 32.04 | 14.90 | 3.207447 | 114494.03 | 41415.71 | MULTIPOLYGON (((-126550 404... |
| SCWA MATHER-SUNRISE | 18249 | 2708 | 8114 | 1553 | 23 | 4507 | 164 | 1180 | 14.84 | 44.47 | 8.51 | 0.12 | 24.70 | 0.90 | 6.47 | 18211 | 1005 | 17206 | 5.52 | 5503 | 228 | 35 | 97 | 57 | 68 | 39 | 12 | 20 | 36 | 189 | 320 | 533 | 645 | 755 | 1003 | 1469 | 416 | 174 | 509 | 590 | 1042 | 1399 | 3756 | 881 | 266 | 855 | 60 | 43 | 893 | 318 | 167 | 22.89 | 8.66 | 3.296327 | 147818.01 | 47448.37 | MULTIPOLYGON (((-112526.7 5... |
| SEQUOIA WATER ASSOC | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 44.60 | 45.84 | 0.00 | 0.00 | 5.93 | 0.00 | 3.63 | 0 | 0 | 0 | 15.75 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 24.49 | 14.65 | 2.490000 | 68248.00 | 38950.00 | POLYGON ((-136929.5 36128.1... |
| SOUTHWEST TRACT W M D [SWS] | 174 | 29 | 42 | 24 | 3 | 75 | 1 | 0 | 16.58 | 24.48 | 13.69 | 1.55 | 43.11 | 0.60 | 0.00 | 174 | 38 | 136 | 21.83 | 57 | 1 | 2 | 7 | 0 | 7 | 0 | 0 | 10 | 12 | 3 | 2 | 5 | 0 | 1 | 2 | 4 | 10 | 29 | 6 | 39 | 10 | 1 | 3 | 1 | 0 | 8 | 0 | 0 | 45 | 29 | 7 | 52.53 | 12.40 | 3.040000 | 45671.00 | 36348.00 | MULTIPOLYGON (((-125843.6 5... |
| SPINDRIFT MARINA | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3.90 | 89.23 | 3.23 | 0.00 | 0.00 | 0.00 | 3.63 | 0 | 0 | 0 | 35.94 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 28.02 | 23.19 | 1.790000 | 38125.00 | 33103.00 | POLYGON ((-139920.3 11468.3... |
| TOKAY PARK WATER CO | 652 | 214 | 134 | 37 | 0 | 239 | 0 | 28 | 32.80 | 20.55 | 5.61 | 0.00 | 36.69 | 0.00 | 4.35 | 652 | 113 | 539 | 17.29 | 173 | 2 | 2 | 3 | 21 | 0 | 0 | 13 | 13 | 10 | 18 | 27 | 36 | 14 | 4 | 10 | 0 | 27 | 36 | 45 | 64 | 81 | 18 | 81 | 38 | 11 | 44 | 0 | 0 | 48 | 32 | 12 | 40.57 | 13.58 | 3.757973 | 62802.24 | 19400.05 | POLYGON ((-122824.8 54197.9... |
| TUNNEL TRAILER PARK | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 49.74 | 34.94 | 0.00 | 0.00 | 4.65 | 0.00 | 10.67 | 0 | 0 | 0 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20.30 | 0.00 | 2.950000 | 153092.00 | 42507.00 | POLYGON ((-136160.9 24171.2... |
| VIEIRA'S RESORT, INC | 4 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 41.43 | 52.47 | 0.00 | 0.00 | 4.55 | 0.00 | 1.56 | 4 | 1 | 3 | 22.60 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 40.53 | 21.84 | 2.030000 | 51977.00 | 40522.00 | POLYGON ((-143780.4 18567.4... |
| WESTERNER MOBILE HOME PARK | 32 | 6 | 6 | 9 | 0 | 10 | 0 | 1 | 17.59 | 17.62 | 28.31 | 0.55 | 31.36 | 0.00 | 4.57 | 31 | 7 | 24 | 23.76 | 10 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 2 | 1 | 1 | 2 | 0 | 1 | 0 | 1 | 2 | 3 | 3 | 4 | 2 | 4 | 2 | 1 | 1 | 0 | 0 | 5 | 3 | 2 | 56.87 | 29.49 | 3.160000 | 59296.00 | 23437.00 | POLYGON ((-122657.2 48977.8... |
6.4.4 Clean & Format Intermediate (Clipped) Data
For visualization and exploration, it’ll also be useful to apply some additional formatting to the clipped block-group data used in intermediate parts of the interpolation process above.
# portion of households paying more than 30% / 50% of income on housing
census_data_interpolate <- census_data_interpolate %>%
mutate(households_all_housing_costs_over30pct_percent =
100 * (households_mortgage_housing_costs_over30pct_count +
households_no_mortgage_housing_costs_over30pct_count +
households_rent_housing_costs_over30pct_count) /
households_count,
.after = households_rent_housing_costs_over50pct_count) %>%
mutate(households_all_housing_costs_over50pct_percent =
100 * (households_mortgage_housing_costs_over50pct_count +
households_no_mortgage_housing_costs_over50pct_count +
households_rent_housing_costs_over50pct_count) /
households_count,
.after = households_all_housing_costs_over30pct_percent)
# drop water system data except name (water_system_name)
census_data_interpolate <- census_data_interpolate %>%
select(-census_unit_area, -clipped_area) %>%
select(-c(water_system_number:water_system_population_reported)) %>%
select(-c(average_household_size_weighted:per_capita_income_weighted)) %>%
relocate(water_system_name, .after = NAME) %>%
relocate(areal_weight_factor, .after = water_system_name)6.4.5 Transform Results to Long Format
For further analysis and exploration / visualization of the results, it will help to convert the results from wide to long format, and edit the group names so that they can be used as titles.
# pivot from wide to long format
water_system_demographics_long <- water_system_demographics %>%
# convert to long format
# st_drop_geometry() %>%
pivot_longer(cols = !c(water_system_name, geometry),
names_to = 'variable',
values_to = 'value') %>%
relocate(geometry, .after = last_col())
# clean variable names and add grouping fields (type, group_type)
water_system_demographics_long <- water_system_demographics_long %>%
mutate(variable = variable %>%
# str_remove_all(pattern = 'percent_') %>%
str_replace_all(pattern = '_', replacement = ' ') %>%
str_replace_all(pattern = ' or ', replacement = ' / ') %>%
str_to_title(.) %>%
str_remove_all(pattern = ' / Alaska Native')) %>%
mutate(variable_type = case_when(
str_detect(variable, pattern = 'Count') ~ 'Count',
str_detect(variable, pattern = 'Percent') ~ 'Percent',
str_detect(variable, pattern = 'Pop Weighted') ~ 'Pop Weighted',
str_detect(variable, pattern = 'Hh Weighted') ~ 'Hh Weighted',
.default = NA),
.after = variable) %>%
mutate(variable_group_type = case_when(
str_detect(variable, pattern ='Population') ~
'Population',
str_detect(variable, pattern = 'Households') ~
'Households',
str_detect(variable, pattern = 'Average Household Size Hh Weighted') ~
'Household Weighted',
str_detect(variable, pattern = 'Median Household Income Hh Weighted') ~
'Household Weighted',
str_detect(variable, pattern = 'Per Capita Income Pop Weighted') ~
'Population Weighted',
str_detect(variable, pattern = 'Poverty') ~
'Population'),
.after = variable_type) %>%
mutate(variable = case_when(
str_detect(variable, pattern = 'Households Count') ~
'Households Total',
.default = str_remove_all(variable, pattern = 'Households'))) %>%
mutate(variable = case_when(
str_detect(variable, 'Population Total Count') ~
'Population Total',
.default = str_remove_all(variable, 'Population'))) %>%
mutate(variable = str_remove_all(variable,
pattern = 'Count')) %>%
mutate(variable = str_remove_all(variable,
pattern = 'Percent')) %>%
mutate(variable = str_remove_all(variable,
pattern = ' Hh Weighted')) %>%
mutate(variable = str_remove_all(variable,
pattern = ' Pop Weighted')) %>%
mutate(variable = str_replace_all(variable,
pattern = 'Over30pct',
replacement = 'Over 30% Income')) %>%
mutate(variable = str_replace_all(variable,
pattern = 'Over50pct',
replacement = 'Over 50% Income')) %>%
mutate(variable = str_trim(variable)) %>%
mutate(variable = str_replace_all(variable,
pattern = 'k ',
replacement = 'k-')) %>%
mutate(variable = str_replace_all(variable,
pattern = '0 ',
replacement = '0-')) %>%
mutate(variable = str_replace_all(variable,
pattern = 'Black-',
replacement = 'Black ')) %>%
mutate(variable = str_replace_all(variable,
pattern = 'Mortgage ',
replacement = 'Mortgage - ')) %>%
mutate(variable = str_replace_all(variable,
pattern = 'Rent ',
replacement = 'Rent - ')) %>%
mutate(variable = str_replace_all(variable,
pattern = 'All ',
replacement = 'All Households - ')) %>%
mutate(variable = str_replace_all(variable,
pattern = 'Households Total',
replacement = 'Total Households')) %>%
mutate(variable = str_replace_all(variable,
pattern = 'Population Total',
replacement = 'Total Population')) %>%
mutate(variable = str_replace_all(variable,
pattern = 'Poverty ',
replacement = 'Poverty - ')) %>%
mutate(variable = str_replace_all(variable,
pattern = 'Poverty - Rate',
replacement = 'Poverty Rate'))Here’s a view of the structure of the reformatted data:
glimpse(water_system_demographics_long)Rows: 3,472
Columns: 6
$ water_system_name <chr> "B & W RESORT MARINA", "B & W RESORT MARINA", "B &…
$ variable <chr> "Total Population", "Hispanic / Latino", "White", …
$ variable_type <chr> "Count", "Count", "Count", "Count", "Count", "Coun…
$ variable_group_type <chr> "Population", "Population", "Population", "Populat…
$ value <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 41…
$ geometry <POLYGON [m]> POLYGON ((-138282.2 13643.2..., POLYGON ((…
We can also do the same for the clipped block-group data used in intermediate parts of the interpolation process above.
census_data_interpolate_long <- census_data_interpolate %>%
pivot_longer(cols = !c(GEOID, NAME, water_system_name,
areal_weight_factor, geometry),
names_to = 'variable',
values_to = 'value') %>%
relocate(geometry, .after = last_col())6.5 Save Results
Once we’ve finished the computations and verified the outputs look reasonable, we can save the results to output files so they can be re-used and shared. The results can be saved in tabular (e.g., csv, excel) and/or spatial (e.g., shapefile, geopackage) formats, which may be helpful for different use cases. Note that you may need to think about exactly what variables to include in the output file(s) and how to format the output datasets (e.g., wide versus long format).
The files saved below are all available here.
The chunk of code below (which is hidden by default), just tests to see whether any of the datasets to be saved have been changed since the previous version was saved. In general this is probably not needed for a typical workflow and can be ignored for most use cases – it is just used here to make rendering of this document a little more efficient.
Code
# compute hash for datasets to be saved (i.e., a unique identifier for each dataset), and compare against previous versions
## define file that stores hash (unique identifier for dataset)
hash_file <- here('03_data_results',
'_dataset_hash.csv')
## compute hashes (unique identifier for datasets)
hash_current <- digest(object = water_system_demographics,
algo = 'md5')
hash_current_long <- digest(object = water_system_demographics_long,
algo = 'md5')
hash_interpolate <- digest(object = census_data_interpolate,
algo = 'md5')
hash_interpolate_long <- digest(object = census_data_interpolate_long,
algo = 'md5')
hash_table_current <- tibble(
dataset = c('water_system_demographics',
'water_system_demographics_long',
'census_data_interpolate',
'census_data_interpolate_long'),
hash = c(hash_current,
hash_current_long,
hash_interpolate,
hash_interpolate_long))
## get the previous hashes from file (if it exists), else create a new file to store the hashes
if (file.exists(hash_file)) {
hash_table_previous <- read_csv(file = hash_file)
} else {
file.create(hash_file)
hash_table_previous <- tibble(
dataset = c('water_system_demographics',
'water_system_demographics_long',
'census_data_interpolate',
'census_data_interpolate_long'),
hash = c('missing',
'missing',
'missing',
'missing'))
}
## if new hash is different from previous hash, set flag to update the output file (i.e., write a new version of the file)
file_update <- !identical(hash_table_current %>%
filter(dataset == 'water_system_demographics') %>%
pull(hash),
hash_table_previous %>%
filter(dataset == 'water_system_demographics') %>%
pull(hash))
file_update_long <- !identical(hash_table_current %>%
filter(dataset == 'water_system_demographics_long') %>%
pull(hash),
hash_table_previous %>%
filter(dataset == 'water_system_demographics_long') %>%
pull(hash))
file_update_interpolate <- !identical(hash_table_current %>%
filter(dataset == 'census_data_interpolate') %>%
pull(hash),
hash_table_previous %>%
filter(dataset == 'census_data_interpolate') %>%
pull(hash))
file_update_interpolate_long <- !identical(hash_table_current %>%
filter(dataset == 'census_data_interpolate_long') %>%
pull(hash),
hash_table_previous %>%
filter(dataset == 'census_data_interpolate_long') %>%
pull(hash))
## write current hashes to file (for comparison with future versions)
write_csv(x = hash_table_current,
file = hash_file,
append = FALSE)6.5.1 Tabular Dataset
The code below saves the tabular results to a csv file, in both the ‘wide’ and ‘long’ formats. The wide format data can also be viewed here, or downloaded with this link. The long format data can be viewed here, or downloaded with this link.
# wide
if (file_update == TRUE) {
write_csv(water_system_demographics %>%
st_drop_geometry(), # drop spatial data
file = here('03_data_results',
'water_system_demographics_sac.csv'))
}
# long
if (file_update_long == TRUE) {
write_csv(water_system_demographics_long %>%
st_drop_geometry(), # drop spatial data
file = here('03_data_results',
'water_system_demographics_sac_long.csv'))
}And we can save the intermediate data from the interpolation process (i.e., data for clipped block groups) in wide and long format – these files can be downloaded with this link and this link respectively.
# wide
if (file_update_interpolate == TRUE) {
write_csv(census_data_interpolate %>%
st_drop_geometry(), # drop the spatial data
file = here('03_data_results',
'intermediate_interpolation_data.csv'))
}
# long
if (file_update_interpolate_long == TRUE) {
write_csv(census_data_interpolate_long %>%
st_drop_geometry(), # drop the spatial data
file = here('03_data_results',
'intermediate_interpolation_data_long.csv'))
}6.5.2 Spatial Dataset
To save the output in a geospatial format, it may be best to save the data in a wide format, so that all of the attribute data for each target area (water system) is in a single row along with its spatial data (i.e. the system boundary information) (saving in long format may create a very large file). The code below saves the results – in wide format – to a geopackage file, which is a spatial file format that is similar to a shapefile. The final water system demographic data is available to downloaded with this link, and the data from the intermediate calculations (for clipped block groups) is available to download with this link.
if (file_update == TRUE) {
st_write(water_system_demographics,
here('03_data_results',
'water_system_demographics_sac.gpkg'),
append = FALSE)
}
if (file_update_interpolate == TRUE) {
st_write(census_data_interpolate,
here('03_data_results',
'intermediate_interpolation_data.gpkg'),
append = FALSE)
}7 Explore and Visualize Results
This section is in progress.
[TODO: Insert Shiny App (iframe)]
For simplicity, this section will focus on presenting estimated demographics for some of the largest water suppliers in the Sacramento county region (results for small water systems may not be very accurate and should be used with some caution - see Section 8 and Section 10 for more investigation of the results for small systems).
# Select systems to plot
## number of systems
n_systems <- 20
## get list of selected systems
systems_top_n <- water_system_demographics %>%
slice_max(population_total_count, n = n_systems) %>%
pull(water_system_name)7.1 Race / Ethnicity
[placeholder]
percent by group (bar)
dot-density
mapview (non-white)
7.2 Income Distributions
[placeholder]
income brackets (50k) (bar)
median household income by block group (dots)
dot-density (below threshold value)
mapview
7.3 Poverty Rates
[placeholder]
dot plot
mapview
side-by-side map
7.4 Income & Relative Housing Costs
The biscale R package (Prener, Grossenbacher, and Zehr 2022) can be used to create maps that show how two metrics vary together spatially (bivariate choropleth maps).
Figure 5 shows the relationship between estimated income and relative housing costs for the top 20 systems by estimated population in Sacramento County.
Code
# Table B25140 - Housing Costs as a Percentage of Household Income in the past 12 months.
# Shows the count of households paying more than 30% or 50% of their income towards housing
# costs broken out by three tenure categories (owned with a mortgage, owned without a mortgage, and rented).
# set defaults
biscale_pal <- 'BlueOr' # 'GrPink' # 'DkViolet2'
biscale_dim <- 3
# create classes
biscale_data <- bi_class(water_system_demographics %>%
filter(water_system_name %in% systems_top_n) %>%
filter(!is.na(median_household_income_hh_weighted)),
x = households_all_housing_costs_over30pct_percent,
y = median_household_income_hh_weighted,
style = "quantile",
dim = biscale_dim)
# create map
biscale_map <- ggplot() +
geom_sf(data = biscale_data,
mapping = aes(fill = bi_class),
color = "white",
size = 0.1,
show.legend = FALSE) +
bi_scale_fill(pal = biscale_pal,
dim = biscale_dim) +
labs(
title = "Estimated % of Households Paying More Than 30% of Income Towards Housing Costs \nand Estimated Median Household Income in Sacramento Water Systems",
subtitle = glue("Top {n_systems} Water Systems by Population"),
caption = glue("Data estimated from {acs_year} 5-year ACS Block Groups")
# title = "Estimated Housing Cost as % of Household Income and \nEstimated Median Household Income in Sacramento Water Systems",
# caption = "% Housing cost shows the percent of households paying more than 30% of their income towards housing costs \nIncome shows median household income (yellow = missing)"
) +
# labs(
# title = "Housing Cost<sup>1</sup> and Income<sup>2</sup> in Sacramento Water Systems",
# caption = "<sup>1</sup>% of households paying more than 30% of their income towards housing costs<br><sup>2</sup>Median household income (yellow = missing)",
# subtitle = glue("Top {n_systems} systems by population")
# ) +
# add missing polygons back in
geom_sf(data = water_system_demographics %>%
filter(water_system_name %in% systems_top_n) %>%
filter(is.na(median_household_income_hh_weighted)),
color = "white",
fill = 'gold'
) +
geom_sf(data = counties_ca %>% filter(NAME == 'Sacramento'),
color = 'grey',
fill = NA) +
bi_theme() +
theme(plot.title = element_text(size=12), # element_markdown(size=12)
plot.subtitle = element_text(size=10),
plot.caption = element_text(size=8, hjust = 1)) # element_markdown(size=8, hjust = 1))
# create legend
biscale_legend <- bi_legend(pal = biscale_pal,
dim = biscale_dim,
xlab = "% Housing Costs ",
ylab = "Income ",
size = 8)
# construct map
biscale_plot <- ggdraw() +
draw_plot(biscale_map, 0, 0, 1, 1) +
draw_plot(biscale_legend, 0.1, .65, 0.2, 0.2)
biscale_plot
Figure 6 shows the same variables (relative housing costs and income) for the portions block groups overlapping Sacramento Suburban Water District – this illustrates the data underlying the interpolation process.
Code
# set defaults
biscale_pal_system <- 'BlueOr' # 'GrPink' # 'DkViolet2'
biscale_dim_system <- 3
# create classes
biscale_data_system <- bi_class(census_data_interpolate %>%
filter(water_system_name == system_plot) %>%
filter(!is.na(median_household_income)),
x = households_all_housing_costs_over30pct_percent,
y = median_household_income,
style = "quantile",
dim = biscale_dim_system)
# create map
biscale_map_system <- ggplot() +
geom_sf(data = biscale_data_system ,
mapping = aes(fill = bi_class),
color = "white",
size = 0.1,
show.legend = FALSE) +
bi_scale_fill(pal = biscale_pal_system,
dim = biscale_dim_system) +
labs(
title = glue("Estimated % of Households Paying More Than 30% of Income Towards Housing Costs \nand Estimated Median Household Income in {str_to_title(system_plot)}"),
# subtitle = glue(""),
caption = glue("Data from {acs_year} 5-year ACS Block Groups (Yellow = Missing Data)")#,
# title = glue("Housing Cost and Income \nin {str_to_title(system_plot)}"),
# caption = "% Housing cost shows the percent of households paying more than 30% of their income towards housing costs \nIncome shows median household income (yellow = missing)"#,
) +
# add the missing polygons back in
geom_sf(data = census_data_interpolate %>%
filter(water_system_name == system_plot) %>%
filter(is.na(median_household_income)),
color = "white",
fill = 'gold'
) +
bi_theme() +
theme(plot.title = element_text(size=12), # element_markdown(size=12)
plot.subtitle = element_text(size=10),
plot.caption = element_text(size=8, hjust = 1)) # element_markdown(size=8, hjust = 1))
# create legend
biscale_legend <- bi_legend(pal = biscale_pal_system,
dim = biscale_dim_system,
xlab = "% Housing Costs ",
ylab = "Income ",
size = 8)
# construct map
biscale_plot_system <- ggdraw() +
draw_plot(biscale_map_system, 0, 0, 1, 1) +
draw_plot(biscale_legend, 0.1, .55, 0.2, 0.2)
biscale_plot_system
8 Check - Estimated vs Reported Population Estimates
[TO DO: Create map]
Based on the map above, it’s apparent that it will be difficult to obtain reasonable estimates for some suppliers, such as the suppliers with very small service areas in the southern portion of the county where the block groups are very large (and the supplier’s service are is only a small fraction of the total area of the block group). These issues are explored further in Section 10.
Note that there are a number of reasons why the estimated population values are likely to differ from the population numbers in the water system dataset (e.g., the depicted boundaries may not be correct or exact, the supplier may have used different methods to count/estimate the population they serve, the time frames for the estimates may be different, etc.). But, there may also be some cases where the numbers differ significantly – depending on the actual analysis being performed, this may mean that further work is needed for certain areas, or could mean that this method may not be sufficient and different methods are needed.
As a check, we can add a column to the interpolated dataset (which we’ll call population_percent_difference) to compute the difference between the estimated total population (in the population_total field) and the total population listed in the water_system_population_reported field (which is the reported value from the water system dataset).
water_system_demographics_check <- water_system_demographics %>%
left_join(water_systems_sac %>%
st_drop_geometry() %>%
select(water_system_name, water_system_population_reported),
by = 'water_system_name')
water_system_demographics_check <- water_system_demographics_check %>%
mutate(population_percent_difference =
round(100 * (population_total_count - water_system_population_reported) /
water_system_population_reported,
2),
.after = water_system_population_reported)For water systems with a small population and/or service area, the estimated demographics may not match the reported population numbers in the water system dataset very well. You can see this in Table 2 by comparing the population_reported field, which contains the total population values from the water supplier dataset, with the population_estimated field, which contains the total population estimated from the census data; the difference between the two is summarized in the population_percent_difference field. This probably indicates that, for small areas, some adjustments and/or further analysis may be needed, and the preliminary estimated values should be treated with some caution/skepticism.
Note: See Section 10 below for some more investigation into water systems whose estimated population is at or near zero.
water_system_demographics_check %>%
arrange(water_system_population_reported) %>%
slice(1:10) %>%
select(water_system_name,
population_reported = water_system_population_reported,
population_estimated = population_total_count,
population_percent_difference) %>%
st_drop_geometry() %>%
kable()| water_system_name | population_reported | population_estimated | population_percent_difference |
|---|---|---|---|
| DELTA CROSSING MHP | 30 | 0 | -100.00 |
| LAGUNA VILLAGE RV PARK | 32 | 20 | -37.50 |
| LINCOLN CHAN-HOME RANCH | 33 | 4 | -87.88 |
| EDGEWATER MOBILE HOME PARK | 40 | 0 | -100.00 |
| MAGNOLIA MUTUAL WATER | 40 | 1 | -97.50 |
| FREEPORT MARINA | 42 | 3 | -92.86 |
| PLANTATION MOBILE HOME PARK | 44 | 10 | -77.27 |
| TUNNEL TRAILER PARK | 44 | 0 | -100.00 |
| SEQUOIA WATER ASSOC | 54 | 0 | -100.00 |
| HAPPY HARBOR (SWS) | 60 | 0 | -100.00 |
But for larger water systems, the estimated population values seem to be more in line with the population numbers in the original dataset. You can see this in Table 3 by, as above, comparing the population_reported field, which contains the total population values from the water supplier dataset, with the population_estimated field, which contains the total population estimated from the census data; the difference between the two is summarized in the population_percent_difference field.
water_system_demographics_check %>%
arrange(desc(water_system_population_reported)) %>%
slice(1:10) %>%
select(water_system_name,
population_reported = water_system_population_reported,
population_estimated = population_total_count,
population_percent_difference) %>%
st_drop_geometry() %>%
kable()| water_system_name | population_reported | population_estimated | population_percent_difference |
|---|---|---|---|
| CITY OF SACRAMENTO MAIN | 510931 | 516189 | 1.03 |
| SACRAMENTO SUBURBAN WATER DISTRICT | 184385 | 193126 | 4.74 |
| SCWA - LAGUNA/VINEYARD | 172666 | 145495 | -15.74 |
| FOLSOM, CITY OF - MAIN | 68122 | 62462 | -8.31 |
| CITRUS HEIGHTS WATER DISTRICT | 65911 | 68912 | 4.55 |
| CALAM - SUBURBAN ROSEMONT | 53563 | 57897 | 8.09 |
| CALAM - PARKWAY | 48738 | 58635 | 20.31 |
| CALAM - LINCOLN OAKS | 47487 | 42916 | -9.63 |
| GOLDEN STATE WATER CO. - CORDOVA | 44928 | 48115 | 7.09 |
| ELK GROVE WATER SERVICE | 42540 | 42647 | 0.25 |
9 Considerations for Detailed Population Estimates
This section is in progress.
If you’re primarily only interested in population estimates (possibly including population by race/ethnicity, age, gender, etc.) and need an estimate that’s as geographically accurate as possible, it may make more sense to use the block-level population data from the decennial census rather than block group level population data from the ACS. However, since the decennial census only occurs once every 10 years, those estimates won’t reflect recent population changes (and will get especially less accurate as we get farther from the last decennial census). But keep in mind that even the 5-year ACS is an average that encompasses previous years’ estimates, so it’s not necessarily temporally precise either.
It’s also possible to use the block-level decennial population data as a weighing factor for ACS population data (to allocate the population within block-group level ACS data).
[TO DO: add example]
10 Considerations for Small / Rural Area Estimates
This section is in progress.
For some water systems, the estimated population using the areal interpolation above (Section 6.2) was at or near zero, and it may be useful to look at an example to see what’s going on with one of those cases.
(because the water system may encompass only a small portion of one or a few census units, and the entire census unit(s) may not be representative of the small portion(s)), especially those in rural environments (where population densities are lower, population centers tend to be spread out, and census units tend to be larger).
[TO DO: insert map]
From the map above [TO DO: insert map], you can see that the service area reported for some systems are very small, only covering a small fraction of a single census unit, resulting in a population estimate that is very low. In these cases, it could be that the system area was drawn incorrectly (i.e., maybe it doesn’t really depict the entire service area), in which case the reported service area should be revised. Or, it’s possible that the population within the given census unit is very un-evenly distributed and instead there’s a relatively high density population cluster in the depicted service area, in which case a more sophisticated method than an area-weighted average should be used (e.g., maybe consider the density of buildings, roads, and/or other features associated with inhabited areas).
11 Alternative Computation Methods
This section is in progress.
11.1 Simplified Method With MOE Estimates
As noted above, determining the margin of error (MOE) for estimates computed using areal weighted interpolation to aggregate portions of census units that overlap the target area of interest may not be possible (more research may be needed). If it’s necessary to compute MOEs for your aggregated values, and/or it’s preferable to use a simpler approach that doesn’t apply areal interpolation to assign fractional portions of census units to the target area, then a simplified method could be applied.
For guidance on how calculate MOEs for some types of derived estimates, see this document.
In this case, one option could be to use a minimum coverage threshold, where entire census units whose portion of area that overlaps the target area is greater than the threshold are treated as part of the target area, and any census units whose portion of area that overlaps the target area is less than the threshold are not treated as part of the target area. But, when using a minimum coverage threshold, some water systems may not have any census units that meet the coverage threshold, so they may need to be accountetd for separately (e.g., by selecting the overlapping census unit that has the greatest portion of overlap, as is done below), or those systems could be excluded from the calculation.
Because this approach operates on entire census units, the census bureau’s recommended approach for aggregating MOEs can be applied to produce an aggregated MOE. (However, keep in mind that the aggregated MOE applies to the uncertainty in the estimate for the census units included in the aggregation, and not may not necessarily capture the uncertainty in the estimate of the target area, since the two areas are now different – i.e., there is an additional unquantified element of uncertainty/error which is not reflected in the MOE due to this mismatch. In general, any estimate which attempts to compute census demographics for areas that don’t align with the census boundaries may have some element of un-quantifiable error – more research/input may be needed.)
tidycensus has functions for calculating MOEs for derived estimates based on Census-supplied formulas, including moe_sum(), moe_product(), moe_ratio(), and moe_prop().
Here’s an example calculation:
# define threshold value
overlap_threshold <- 0.5
# get census data (with MOEs)
census_data_acs_moe <- get_acs(geography = 'block group',
state = 'CA',
county = counties_list,
filter_by = water_systems_filter,
year = acs_year,
survey = 'acs5',
variables = census_vars_acs,
output = 'wide', # can be 'wide' or 'tidy'
geometry = TRUE,
cache_table = TRUE) %>%
st_transform(crs_projected) # convert to common coordinate system
# compute area of overlap for each census unit
simplified_calc <- census_data_acs_moe %>%
mutate(census_unit_area = st_area(.)) %>%
st_intersection(water_systems_sac %>%
select(water_system_name)) %>%
mutate(clipped_area = st_area(.)) %>%
mutate(overlap_portion = drop_units(clipped_area / census_unit_area)) %>%
mutate(geoid_system = paste(GEOID, water_system_name, sep = '|')) %>%
st_drop_geometry()
# determine which census units to include, based on threshold value
simplified_calc <- simplified_calc %>%
mutate(above_threshold = overlap_portion >= overlap_threshold)
# account for water systems with no census units that meet the threshold value
## get list of systems with at least 1 census unit above threshold
simplified_systems_above_threshold <- simplified_calc %>%
filter(above_threshold == TRUE) %>%
pull(water_system_name) %>%
unique()
## get list of systems with no census units above threshold
simplified_systems_below_threshold <- water_systems_sac %>%
filter(!water_system_name %in% simplified_systems_above_threshold) %>%
pull(water_system_name)
## select the 1 census unit per system with the greatest overlap
simplified_systems_below_threshold_keep <- simplified_calc %>%
filter(water_system_name %in% simplified_systems_below_threshold) %>%
group_by(water_system_name) %>%
slice_max(order_by = overlap_portion, n = 1) %>%
ungroup()
# filter census units based on threshold value (and account for water systems
# with no census units that meet the threshold value)
geoid_system_keep_above_threshold <- simplified_calc %>%
filter(above_threshold == TRUE) %>%
pull(geoid_system)
geoid_system_keep_below_threshold <- simplified_systems_below_threshold_keep %>%
pull(geoid_system)
census_data_acs_moe <- census_data_acs_moe %>%
st_join(water_systems_sac %>% select(water_system_name)) %>%
mutate(geoid_system = paste(GEOID, water_system_name, sep = '|')) %>%
filter(geoid_system %in% c(geoid_system_keep_above_threshold,
geoid_system_keep_below_threshold))
# aggregate
# [TO DO: compute aggregated values]
# simplified_calc_aggregate <- census_data_acs_moe %>%
# group_by(water_system_name, water_systems_filter) %>%
# {.} %>%
# ungroup()
# compute MOEs
# [TO DO - use tidycensus functions to calculate MOEs]Figure 7 shows the census units used in this simplified method to estimate demographics for Sacramento Suburban Water District.
Code
mapview(census_data_acs_moe %>%
filter(water_system_name == system_plot),
alpha.regions = 0.8,
col.regions = 'grey60',
color = 'cyan',
# lwd = 1.3,
label = 'NAME',
layer.name = 'ACS Data',
legend = FALSE) + # zcol = 'NAME'
mapview(water_systems_sac %>%
filter(water_system_name == system_plot),
alpha.regions = 0.3,
col.regions = 'darkblue',
color = 'black',
lwd = 1.3,
zcol = 'water_system_name',
# label = 'water_system_name',
layer.name = 'Water System Boundary',
legend = FALSE)While this approach may work well for relatively large water systems (where the size of the system is significantly greater than the census units used for the analysis), for smaller water systems this method might be somewhat more problematic, as shown in Figure 8.
Code
system_plot_small <- 'RIO LINDA/ELVERTA COMMUNITY WATER DIST'
mapview(census_data_acs_moe %>%
filter(water_system_name == system_plot_small),
alpha.regions = 0.8,
col.regions = 'grey60',
color = 'cyan',
# lwd = 1.3,
label = 'NAME',
layer.name = 'ACS Data',
legend = FALSE) + # zcol = 'NAME'
mapview(water_systems_sac %>%
filter(water_system_name == system_plot_small),
alpha.regions = 0.3,
col.regions = 'darkblue',
color = 'black',
lwd = 1.3,
zcol = 'water_system_name',
# label = 'water_system_name',
layer.name = 'Water System Boundary',
legend = FALSE)Figure 9 shows another example of a small system – in this case there are large block groups which the water system only overlaps a small portion of.
Code
system_plot_small <- 'RANCHO MURIETA COMMUNITY SERVI'
mapview(census_data_acs %>%
st_filter(water_systems_sac %>%
filter(water_system_name == system_plot_small)) %>%
filter(!GEOID %in% (census_data_acs_moe %>%
filter(water_system_name == system_plot_small) %>%
pull(GEOID))),
alpha.regions = 0.3,
col.regions = 'grey80',
color = 'grey',
# lwd = 1.3,
label = 'NAME',
layer.name = 'ACS Data - Not Used',
legend = FALSE) + # zcol = 'NAME'
mapview(census_data_acs_moe %>%
filter(water_system_name == system_plot_small),
alpha.regions = 0.8,
col.regions = 'grey60',
color = 'cyan',
# lwd = 1.3,
label = 'NAME',
layer.name = 'ACS Data - Used',
legend = FALSE) + # zcol = 'NAME'
mapview(water_systems_sac %>%
filter(water_system_name == system_plot_small),
alpha.regions = 0.3,
col.regions = 'darkblue',
color = 'black',
lwd = 1.3,
zcol = 'water_system_name',
# label = 'water_system_name',
layer.name = 'Water System Boundary',
legend = FALSE)11.2 Population Weighted Interpolation
The tidycensus package also has a function for population weighted interpolation, interpolate_pw, but it uses a somewhat different methodology than the population weighted interpolation procedure applied above in Section 6.3.
Note that some water systems may not get an estimated value using this method, even if NA values are removed from the source data first.
results_interpolate_pw <- interpolate_pw(from = census_data_acs %>%
# population_total_count median_household_income
filter(!is.na(population_total_count)) %>%
select(population_total_count),
to = water_systems_sac,
to_id = 'water_system_name',
extensive = TRUE, # use FALSE for median_household_income
weights = census_data_decennial,
# weight_placement = 'surface',
weight_column = 'population_total_count') %>%
# rename(median_household_income_interpolate_pw = median_household_income) # rename results field
rename(population_total_count_interpolate_pw = population_total_count) %>%
mutate(population_total_count_interpolate_pw = round(population_total_count_interpolate_pw,
0))
# sum(is.na(results_interpolate_pw$median_household_income_interpolate_pw))
# sum(is.na(results_interpolate_pw$population_total_count_interpolate_pw))This returns 17 NAs (it looks like those are small areas). Table 4 shows a comparison of the system populations estimated using interpolate_pw and the reported system populations:
results_interpolate_pw %>%
left_join(water_systems_sac %>%
st_drop_geometry() %>%
select(water_system_service_connections,
water_system_population_reported,
water_system_name),
by = 'water_system_name') %>%
relocate(water_system_population_reported,
.after = population_total_count_interpolate_pw) %>%
kable(caption = 'A Caption') %>%
scroll_box(height = "400px")| water_system_name | population_total_count_interpolate_pw | water_system_population_reported | water_system_service_connections | geometry |
|---|---|---|---|---|
| HOOD WATER MAINTENCE DIST [SWS] | 74 | 100 | 82 | MULTIPOLYGON (((-132703 403... |
| MC CLELLAN MHP | 412 | 700 | 199 | MULTIPOLYGON (((-119809.2 7... |
| MAGNOLIA MUTUAL WATER | 96 | 40 | 34 | MULTIPOLYGON (((-137015.5 3... |
| KORTHS PIRATES LAIR | NA | 150 | 64 | MULTIPOLYGON (((-137297.1 1... |
| EL DORADO MOBILE HOME PARK | 1031 | 256 | 128 | MULTIPOLYGON (((-124533.7 5... |
| RIVER'S EDGE MARINA & RESORT | NA | 150 | 83 | MULTIPOLYGON (((-141320.7 1... |
| LAGUNA VILLAGE RV PARK | NA | 32 | 28 | MULTIPOLYGON (((-122362.2 4... |
| SPINDRIFT MARINA | 13 | 100 | 50 | MULTIPOLYGON (((-140144.2 1... |
| SAC CITY MOBILE HOME COMMUNITY LP | NA | 350 | 164 | MULTIPOLYGON (((-124542.8 5... |
| ORANGE VALE WATER COMPANY | 17910 | 18005 | 5684 | MULTIPOLYGON (((-104831.8 7... |
| GOLDEN STATE WATER CO. - CORDOVA | 48645 | 44928 | 14798 | MULTIPOLYGON (((-105024.5 6... |
| HOLIDAY MOBILE VILLAGE | NA | 200 | 115 | MULTIPOLYGON (((-123885.3 5... |
| SOUTHWEST TRACT W M D [SWS] | 183 | 150 | 33 | MULTIPOLYGON (((-125846.9 5... |
| CARMICHAEL WATER DISTRICT | 39773 | 37897 | 11704 | MULTIPOLYGON (((-112199 690... |
| SCWA - ARDEN PARK VISTA | 9617 | 10035 | 3043 | MULTIPOLYGON (((-120321.1 6... |
| SCWA - LAGUNA/VINEYARD | 157782 | 172666 | 47411 | MULTIPOLYGON (((-125203.9 4... |
| RIO COSUMNES CORRECTIONAL CENTER [SWS] | NA | 2800 | 13 | MULTIPOLYGON (((-124028.8 3... |
| SCWA MATHER-SUNRISE | 19629 | 22839 | 6921 | MULTIPOLYGON (((-108416.1 5... |
| TUNNEL TRAILER PARK | NA | 44 | 21 | MULTIPOLYGON (((-136156.2 2... |
| IMPERIAL MANOR MOBILEHOME COMMUNITY | 242 | 200 | 186 | MULTIPOLYGON (((-115157 740... |
| CALAM - ISLETON | 519 | 1581 | 480 | MULTIPOLYGON (((-138559.3 1... |
| FOLSOM, CITY OF - ASHLAND | 3719 | 3538 | 1079 | MULTIPOLYGON (((-100505 774... |
| LOCKE WATER WORKS CO [SWS] | 76 | 80 | 44 | MULTIPOLYGON (((-131976.7 2... |
| DEL PASO MANOR COUNTY WATER DI | 5784 | 4520 | 1796 | MULTIPOLYGON (((-120274.9 6... |
| EAST WALNUT GROVE [SWS] | 347 | 300 | 166 | MULTIPOLYGON (((-132506.6 2... |
| FOLSOM STATE PRISON | 32 | 9703 | 2790 | MULTIPOLYGON (((-99838.11 7... |
| CALAM - ARDEN | 11512 | 3908 | 1185 | MULTIPOLYGON (((-123013.9 6... |
| EDGEWATER MOBILE HOME PARK | NA | 40 | 22 | MULTIPOLYGON (((-153715.8 7... |
| CALAM - LINCOLN OAKS | 44168 | 47487 | 14390 | MULTIPOLYGON (((-113737.5 7... |
| VIEIRA'S RESORT, INC | 67 | 150 | 107 | MULTIPOLYGON (((-143708.1 1... |
| FLORIN COUNTY WATER DISTRICT | 11114 | 7831 | 2323 | MULTIPOLYGON (((-119638.3 5... |
| WESTERNER MOBILE HOME PARK | 20 | 65 | 49 | MULTIPOLYGON (((-122693.1 4... |
| EL DORADO WEST MHP | 227 | 172 | 128 | MULTIPOLYGON (((-124742.1 5... |
| TOKAY PARK WATER CO | 530 | 525 | 198 | MULTIPOLYGON (((-122789.8 5... |
| LAGUNA DEL SOL INC | NA | 470 | 112 | MULTIPOLYGON (((-105025.5 4... |
| OLYMPIA MOBILODGE | 176 | 450 | 200 | MULTIPOLYGON (((-123549.9 5... |
| GOLDEN STATE WATER CO - ARDEN WATER SERV | 6516 | 5125 | 1716 | MULTIPOLYGON (((-120307.3 6... |
| ELEVEN OAKS MOBILE HOME COMMUNITY | 368 | 262 | 136 | MULTIPOLYGON (((-119816 721... |
| CALAM - ANTELOPE | 36641 | 34720 | 10528 | MULTIPOLYGON (((-123334.7 8... |
| CALIFORNIA STATE FAIR | NA | 650 | 269 | MULTIPOLYGON (((-125658 651... |
| PLANTATION MOBILE HOME PARK | NA | 44 | 44 | MULTIPOLYGON (((-124338.4 5... |
| CALAM - PARKWAY | 57391 | 48738 | 14779 | MULTIPOLYGON (((-123335.4 5... |
| CITRUS HEIGHTS WATER DISTRICT | 69931 | 65911 | 19940 | MULTIPOLYGON (((-113645.4 7... |
| SEQUOIA WATER ASSOC | NA | 54 | 18 | MULTIPOLYGON (((-136917.7 3... |
| FAIR OAKS WATER DISTRICT | 38819 | 35114 | 14293 | MULTIPOLYGON (((-107763.5 6... |
| RANCHO MARINA | NA | 250 | 77 | MULTIPOLYGON (((-138162.3 1... |
| FREEPORT MARINA | 105 | 42 | 27 | MULTIPOLYGON (((-130871.1 5... |
| HAPPY HARBOR (SWS) | NA | 60 | 45 | MULTIPOLYGON (((-139870.3 1... |
| FOLSOM, CITY OF - MAIN | 65206 | 68122 | 21424 | MULTIPOLYGON (((-107067.9 6... |
| RANCHO MURIETA COMMUNITY SERVI | 4853 | 5744 | 2726 | MULTIPOLYGON (((-95334.2 52... |
| CAL AM FRUITRIDGE VISTA | 21116 | 15385 | 4667 | MULTIPOLYGON (((-125839.3 5... |
| B & W RESORT MARINA | NA | 100 | 37 | MULTIPOLYGON (((-138335.4 1... |
| CALAM - SUBURBAN ROSEMONT | 60288 | 53563 | 16238 | MULTIPOLYGON (((-120054.6 5... |
| SAN JUAN WATER DISTRICT | 30997 | 29641 | 10672 | MULTIPOLYGON (((-100141.1 8... |
| ELK GROVE WATER SERVICE | 42652 | 42540 | 12882 | MULTIPOLYGON (((-113074.8 4... |
| DELTA CROSSING MHP | NA | 30 | 22 | MULTIPOLYGON (((-132728.4 4... |
| GALT, CITY OF | 26962 | 26536 | 7471 | MULTIPOLYGON (((-115624.6 2... |
| LINCOLN CHAN-HOME RANCH | NA | 33 | 19 | MULTIPOLYGON (((-137040.3 3... |
| RIO LINDA/ELVERTA COMMUNITY WATER DIST | 15102 | 14381 | 4621 | MULTIPOLYGON (((-125758.1 7... |
| SACRAMENTO SUBURBAN WATER DISTRICT | 190956 | 184385 | 46573 | MULTIPOLYGON (((-115313.2 7... |
| CALAM - WALNUT GROVE | 388 | 651 | 197 | MULTIPOLYGON (((-132035.3 2... |
| CITY OF SACRAMENTO MAIN | 525914 | 510931 | 142794 | MULTIPOLYGON (((-128741.8 5... |
12 Working with Other Source Datasets
This section is in progress.
In addition to using census data, it’s possible to use other types of source datasets to compute characteristics of custom target areas like water systems. The process is generally likely to be similar to the processes shown above using census data, but each source dataset may require unique considerations (e.g., to handle missing values, uncertain boundaries, etc.).
12.1 CalEnviroScreen
[TO DO: example computation of weighted average CES scores]
Notes to consider:
Some census tracts are missing CES scores (overall and/or for certain indicators), and have to deal with those missing values somehow
CES 4.0 is tract-level data, and uses 2010 census boundaries (so boundaries won’t match current ACS or decennial boundaries)
CES 4.0 boundaries are simplified, and boundaries between tracts are not consistent – for some types of analysis (especially when looking at point data - e.g., facilities), it may be better to use the original TIGER dataset (available from either the
tidycensusortigrisR packages)